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ASYMPTOTIC BEHAVIOUR OF THE EMPIRICAL BAYES 
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University of Technology*^, and Leiden University^^ 

We consider the asymptotic behaviour of the marginal maximum 
likelihood empirical Bayes posterior distribution in general setting. 

First we characterize the set where the maximum marginal likeli¬ 
hood estimator is located with high probability. Then we provide 
oracle type of upper and lower bounds for the contraction rates of 
the empirical Bayes posterior. We also show that the hierarchical 
Bayes posterior achieves the same contraction rate as the maximum 
marginal likelihood empirical Bayes posterior. We demonstrate the 
applicability of our general results for various models and prior dis¬ 
tributions by deriving upper and lower bounds for the contraction 
rates of the corresponding empirical and hierarchical Bayes posterior 
distributions. 


1. Introduction. In the Bayesian approach, the whole inference is based 
on the posterior distribution, which is proportional to the likelihood times 
the prior (in case of dominated models). The task of designing a prior dis¬ 
tribution n on the parameter 0 G 0 is difficult and in large dimensional 
models cannot be performed in a fully subjective way. It is therefore com¬ 
mon practice to consider a family of prior distributions n(-|A) indexed by a 
hyper-parameter A G A and to either put a hyper-prior on A (hierarchical 
approach) or to choose A depending on the data, so that A = A(x„) where x„ 
denotes the collection of observations. The latter is refered to as an empirical 
Bayes (hereafter EB) approach, see for instance [18]. There are many ways 
to select the hyper-parameter A based on the data, in particular depending 
on the nature of the hyper-parameter. 
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Recently [20] have studied the asymptotic behaviour of the posterior dis¬ 
tribution for general empirical Bayes approaches; they provide conditions to 
obtain consistency of the EB posterior and in the case of parametric models 
characterized the behaviour of the maximum marginal likelihood estimator 
A„ = A(x„) (hereafter MMLE), together with the corresponding posterior 
distribution n(-|A„;x„,) on 9. They show that asymptotically the MMLE 
converges to some oracle value Aq which maximizes, in A, the prior density 
calculated at the true value 9q of the parameter, 7r(0o|Ao) = sup{7r(6*o|A), A G 
A}, where the density is with respect to Lebesgue measure. This cannot be 
directly extended to the nonparametric setup, since in this case, typically 
the prior distributions n(-|A), A G A are not absolutely continuous with 
respect to a fixed measure. In the nonparametric setup the asymptotic be¬ 
haviour of the MMLE and its associated EB posterior distribution has been 
studied in the (inverse) white noise model under various families of Gaussian 
prior processes by [3, 10, 16, 30, 31], in the nonparametric regression prob¬ 
lem with smoothing spline priors [26] and rescaled Brownian motion prior 
[28], and in a sparse setting by [14]. In all these papers, the results have 
been obtained via explicit expression of the marginal likelihood. Interesting 
phenomena have been observed in these specific cases. In [30] an infinite 
dimensional Gaussian prior was considered with hxed regularity parameter 
a and a scaling hyper-parameter r. Then it was shown that the scaling pa¬ 
rameter can compensate for possible mismatch of the base regularity a of 
the prior distribution and the regularity (3 of the true parameter of interest 
up to a certain limit. However, too smooth truth can only be recovered sub- 
optimally by MMLE empirical Bayes method with rescaled Gaussian priors. 
In contrast to this in [16] it was shown that by substituting the MMLE of 
the regularity hyper-parameter into the posterior, then one can get optimal 
contraction rate (up to a logn factor) for every Sobolev regularity class, 
simultaneously. 

In this paper we are interested in generalizing the specific results of [16] 
(in the direct case), [30] to more general models, shading light on what is 
driving the asymptotic behaviour of the MMLE in nonparametric or large 
dimensional models. We also provide sufficient conditions to derive poste¬ 
rior concentration rates for EB procedures based on the MMLE. Einally we 
investigate the relationship between the MMLE empirical Bayes and hier¬ 
archical Bayes approaches. We show that the hierarchical Bayes posterior 
distribution (under mild conditions on the hyper-prior distribution) achieves 
the same contraction rate as the MMLE empirical Bayes posterior distribu¬ 
tion. Note that our results do not answer the question whether empirical 
Bayes and hierarchical Bayes posterior distributions are strongly merging. 
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which is certainly of interest, but would require typically a much more pre¬ 
cise analysis of the posterior distributions. 

More precisely, set x„ the vector of observations and assume that condi¬ 
tionally on some parameter 0 G 0, x„ is distributed according to Pq with 
density Pq with respect to some given measure p. Let n(-|A), A G A be a fam¬ 
ily of prior distributions on 0. Then the associated posterior distributions 
are equal to 

= m(x„|A)= [ pU^n)dU{e\X) 

m[Xn\X) Je 

for all A G A and any borelian subset B of Q. The MMLE is defined as 

(1.1) A„ G argmax;^g^^m(x„|A) 

for some A^ Q A, and the associated EB posterior distribution by n(-|xn. An). 
We note that in case there are multiple maximizers one can take an arbi¬ 
trary one. Eurthermore from practical consideration (both computational 
and technical) we allow the maximizer to be taken on the subset An C A. 

Our aim is two fold, first to characterize the asymptotic behaviour of An 
and second to derive posterior concentration rates in such models, i.e. to 
determine sequences En going to 0 such that 

( 1 . 2 ) Il(^e : d{6,eo) < en\xn,Xn'^ ^ I 

in probability under with 9q £ Q and d {.,.) some appropriate positive 
loss function on 0 (typically a metric or semi-metric, see condition (A2) 
later for more precise description). There is now a substantial literature on 
posterior concentration rates in large or infinite dimensional models initiated 
by the seminal paper of [12]. Most results, however, deal with fully Bayesian 
posterior distributions, i.e. associated to priors that are not data dependent. 
The literature on EB posterior concentration rates deals mainly with specific 
models and specific priors. 

Recently, in [9] , sufficient conditions are provided for deriving general EB 
posterior concentration rates when it is known that Xn belongs to a well 
chosen subset Aq of A. In essence, their result boils down to controlling 
supAeAo n (ii(0,0o) > en|xn,A). Hence either A has very little influence on 
the posterior concentration rate and it is not so important to characterize 
precisely Aq or A is influential and it becomes crucial to determine properly 
Aq. In [9], the authors focus on the former. In this paper we are mainly con¬ 
cerned with the latter, with A^ the MMLE. Since the MMLE is an implicit 
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estimator (as opposed to the moment estimates considered in [9]) the main 
difficulty here is to understand what the set Aq is. 

We show in this paper that Ag can be characterized roughly as 

Aq — • £n(^) ^ MjiSnfl} 

for any sequence Mn going to infinity and with Sn,o = inf{£:„(A); A G A„} 
and en(A) satisfying 

(1.3) U{\\e-6o\\<KEn{X)\X) = e-^^"i^\ 

with (0, II • II) a Banach space and for some large enough constant K (in the 
notation we omitted the dependence of Sn{X) on K and Oq). We then prove 
that the concentration rate of the MMLE empirical Bayes posterior distribu¬ 
tion is of order 0{Mn£nfi)- We also show that the preceding rates are sharp, 
i.e. the posterior contraction rate is bounded from below by (for ar¬ 

bitrary 5n = o(l)). Hence our results reveal the exact posterior contraction 
rates for every individual 9q ^ Q. Furthermore, we also show that the hier¬ 
archical Bayes method behaves similarly, i.e. the hierarchical posterior has 
the same upper {Mn£n,o) and lower (dn£n,o) bounds on the contraction rate 
for every 0o £ © as the MMLE empirical Bayes posterior. 

Our aim is not so much to advocate the use of the MMLE empirical Bayes 
approach, but rather to understand its behaviour. Interestingly, our results 
show that it is driven by the choice of the prior family {n(.|A),A G A)} 
in the neighbourhood of the true parameter 9o- This allows to determine a 
priori which family of prior distributions will lead to well behaved MMLE 
empirical Bayes posteriors and which won’t. In certain cases, however, the 
computation of the MMLE is very challenging. Therefore it would be in¬ 
teresting to investigate other type of estimators for the hyper-parameters 
like the cross validation estimator. At the moment there is only a limited 
number of papers on this topic and only for specific models and priors, see 
for instance [28, 29]. 

These results are summarized in Theorem 2.1, in Corollary 2.1, and in 
Theorem 2.3, in Section 2. Then three different types of priors on O = £2 = 
< -|-oo} are studied, for which upper bounds on 6,1 (A) are 
given in Section 3.1. We apply these results to three different sampling mod¬ 
els: the Gaussian white noise, the regression and the estimation of the density 
based on iid data models in Sections 3.5 and 3.6. Proofs are postponed to 
Section 4, to the appendix for those concerned with the determination of 
EniX) and to the Supplementary material [25] 
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1.1. Notations and setup. We assume that the observations x„ E Xn 
(where Xn denotes the sample space) are distributed according to a distri¬ 
bution Pg (they are not necessarily i.i.d.), with 0 G 0, where (0, || • ||) is a 
Banach space. We denote by // a dominating measure and by pg and Eg the 
corresponding density and expected value of Pg, respectively. We consider 
the family of prior distributions {n(-|A), A G A} on 0 with A C for some 
d > 1 and we denote by n(-|x„; A) the associated posterior distributions. 

Throughout the paper K{6q,9) denotes the Kullback-Leibler divergence 
between Pg^ and Pg for all 9,6q £ Q while V2(9o,9) denotes the centered 
second moment of the log-likelihood: 

K{9o,9) = [ pg^{xn)log (%(xn)) d/i(x„), 

JXn \Pe / 

V2{9o, 9) = El (l4(0o) -4(0) - K{9^,9)\^) 

with 4(0) = logpg(x„). As in [13], we define the Kullback-Leibler neigh¬ 
bourhoods of 00 as 


B{9o,e,2) = {0;iL(0o,0) < ne^, ^2(00,0) < ne^} 


and note that in the above definition V 2 ( 0 o, 0 ) < can be replaced by 
^2(00)0) < Cne'^ for any positive constant C without changing the results. 

For any subset A C 0 and e > 0, we denote log A^(e, A, (i(-, •)) the e - 
entropy of A with respect to the (pseudo) metric d(-, •), i.e. the logarithm of 
the covering number of A by d{-, •) balls of radius e. 

We also write 


m(x„|A) 


Peo(.^n) 


/ePg(xn)dn(g|A) 

Peoi^n) 


For any bounded function /, ||/||oo = sup^, |/(x)| and if ip denotes a 
countable collection of functions {ipi,i G N), then |4||oo = max* jj^^illcxD- If 
the function is integrable then ||/||i denotes its Li norm while ||/||2 its L 2 
norm and if 0 G 4 = {0 = (0i)«eN) |0ir < + 00 }) with r > 1, ||04 = 

Throughout the paper Xn < Un means that there exists a constant C such 
that for n large enough Xn < Cyn, similarly with > yn and Xn ^ yn is 
equivalent to yn 1:, Xn ^ yn- For equivalent (abbreviated) notation we use 
the symbol =. 
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2. Asymptotic behaviour of the MMLE, its associated poste¬ 
rior distribution and the hierarchical Bayes method. Although the 
problem can be formulated as a classical parametric maximum likelihood 
estimation problem, since A is finite dimensional, its study is more involved 
than the usual regular models due to the complicated nature of the marginal 
likelihood. Indeed m(x„|A) is an integral over an inhnite (or large) dimen¬ 
sional space. 

For 6q G Q denoting the true parameter, dehne the sequence en(A) = 
en{X,9o,K) as 

( 2 . 1 ) n (0 : ||0 - 0 o || < Ken{X)\X) = 

for some positive parameter K >0. If the cumulative distribution function 
of ||0 — 0o|| under n(-|A) is not continuous, then the definition of £n{X) can 
be replaced by 

(2.2) Co^ne„(A)2 < -logn(0 : \\9 - OoW < K£n{X)\X) < conen{X)^ 

for some cq > 1 under the assumption that such a sequence £n{X) exists. 

Roughly speaking, under the assumptions stated below, logm(x„|A) x 
ne^(A) and £n{X) is the posterior concentration rate associated to the prior 
n(-| A) and the best possible (oracle) posterior concentration rate over A G A„ 
is denoted 

£n 0 = inf {en(A)^ : £n{X)‘^ > mn(logn)/n} V mn(logn)/n, 

AeA„ 

with any sequence nin tending to infinity. 

With the help of the oracle value en,o we define a set of hyper-parameters 
with similar properties, as: 

(2.3) Aq = Ao{Mn) = Ao^n{K,9o, Mn) = {A G A„ : £n{X) < Mnen,o}, 

with any sequence going to inhnity. We show that under general (and 
natural) assumptions the marginal maximum likelihood estimator A„ be¬ 
longs to the set Aq with probability tending to one, for some constant K > 0 
large enough. The parameter K provides extra flexibility to the approach 
and simplifies the proofs of the upcoming conditions in certain examples. In 
practice at least in the examples we have studied) the constant K essentially 
modifies en(A) by a multiplicative constant and thus does not modify the 
final posterior concentration rate, nor the set Aq since is any sequence 
going to infinity. Note that our results are only meaningful in cases where 
£n{X) dehned by (2.2) vary with A. 
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We now give general conditions under which the MMLE is inside of the set 
Aq with probability going to 1 under Using [9], we will then deduce that 
the concentration rate of the associated MMLE empirical Bayes posterior 
distribution is bounded by Mn£nfi- 

Following [20] and [9] we construct for all A, A' € a transformation 
: 0 i-A 0 such that if 0 ~ n(-|A) then ~ n(-|A') and for a 

given sequence Un ^ 0 we introduce the notation 

(2-4) qlJx^)= sup 

P(\y)<Un 


where p : A„ x A„ —)• 1R+ is some loss function and „ the associated 
measure. Denote by A^„(Ao), A^n(A-n \ Aq), and A^„(A„) the covering number 
of Aq, A„ \ Aq and A„ by balls of radius Un, respectively, with respect of the 
loss function p. 

We consider the following set of assumptions to bound sup_)^g^^\^^p m(x„| A) 
from above. 


• (Al) There exists > 0 such that for all A € A„ \ Aq and n > N, 
there exists 0n(A) C 0 


(2.5) 


logQl^iXn) 

sup - .,^2 

{||6»-6»o||<A£„(A)}ne„(A) nen[-\) 


o(l), 


and such that 

(2.6) [ Ql^{Xn)dU{e\X) < e-^'Xo, 

for some positive sequence Wn going to inhnity. 

• (A2) [tests] There exists 0 < Cj ci < 1 such that for all A G A„ \ Aq 
and all 9 G 0„(A), there exist tests q^n{9) such that 

(2.7) 

sup Q^'jl-cpnm < 

d(6»,0')<Crf(6»,6»o) 


where is a semi-metric satisfying 


(2.8) 0„(A)n{||0-0o|| > KeniX)} C 0„(A) n W,0o) > c{X)sn{X)} 


for some c(A) > Wn£nfl/£nW and 

(2.9) log {u < d{9, 0 q) < 2u} n 0n(A), d{-, •)) < cinu^ j2 
for all u > c{X)£n{X). 
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Remark 2.1. We note that we can weaken (2.5) to 

sup 

{||e-eo||<£n(A)}ne„(A) 

for some positive constant c < 1 in case the cumulative distribution of H'—^oll 
under Il{-\X) is continuous and hence the definition (2.1) is meaningful. 

Conditions (2.5) and (2.6) imply that we can control the small perturba¬ 
tions of the likelihood ^^^g^(x„) due to the change of measures V’a,a' 
are similar to those used in [9]. They allow us to control m(x„|A) uniformly 
over An \ Aq. They are rather weak conditions since Un can be chosen very 
small. In [9], the authors show that they hold even with complex priors such 
as nonparametric mixture models. Assumption (A2) (2.7), together with 
(2.9) have been verified in many contexts, with the difference that here the 
tests need to be performed with respect to the perturbed likelihoods 
Since the Un - mesh of A„ \ Aq can be very fine, these perturbations can 
be well controlled over the sets ©^(A), see for instance [9] in the context 
of density estimation or intensity estimation of Aalen point processes. The 
interest of the above conditions is that they are very similar to standard 
conditions considered in the posterior concentration rates literature, start¬ 
ing with [12] and [13], so that there is a large literature on such types of 
conditions which can be applied in the present setting. Therefore, the usual 
variations on these conditions can be considered. For instance an alternative 
condition to (A2) is: 

(A2 bis) There exists 0 < (^ < 1 such that for all A G A^ \ Aq and all 
9 G 0n(A), there exist tests ipn{9) such that (2.7) is verified and for all 
j > K, writing 

Bn,jW = 0n(A) n {j£nW < jj^ ” ^ojl < (j + l)^n(A)}, 

then 

Bn,j{X) C 0n(A) n{d{9,6o) > c(A,J>n(A)} 

with 

^ exp (-|nc(A, e4A)2) < 

j>K 

and 

logN{Cc{X,j)sniX),BnAX),d{-r)) < 

Here the difficulty lies in the comparison between the metric jj • jj of the 
Banach space and the testing distance d{-, •), in condition (2.8). ©utside the 
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white noise model, where the Kullback and other moments of the likelihood 
ratio are directly linked to the L 2 norm on 6 — 6q, such comparison may be 
non trivial. In van der Vaart and van Zanten [33], the prior had some natural 
Banach structure and norm, which was possibly different to the Kullback- 
Leibler and the testing distance but comparable in some sense. Our 

approach is similar in spirit. We illustrate this here in the special cases of 
regression function and density estimation under different families of priors, 
see Sections 3.5 and 3.6.1. In Section 3.6.2 we use a prior which is not so 
much driven by a Banach structure and the norm || • || is replaced by the 
Bellinger distance. Hence in full generality || • || could be replaced by any 
metric, for instance the testing metric d{-, •), as long as the rates en(A) can 
be computed. 

The following assumption is used to bound from below sup^xg^^o "^(xnlA) 
• (Bl) There exist Aq C Aq and M 2 > 1 such that for every A G Aq 

{||0 - 0o|| < KsniX)} C B(0o, M 2 e„(A), 2), 

and such that there exists Aq G Aq for which £n(Xo) < MiSnfi for some 
positive Ml. 


Remark 2.2. A variation of (Bl) can he considered where {||0 — 0o|| ^ 
KsniX)} is replaced by {||0 — 0o|| < A'en(A)} H 0n(A) where 0n(A) C 0 
verifies 


n 


0o|| <i^en(A)}n0„(A) 


> p-K2nel(\) 

rs-/ ^ 1 


for some K 2 > I- This is used in Section 3.6. 


2.1. Asymptotic behaviour of the MMLE and empirical Bayes posterior 
concentration rate. We now present the two main results of this Section, 
namely : asymptotic behaviour of the MMLE and concentration rate of 
the resulting empirical Bayes posterior. We first describe the asymptotic 
behaviour of A„. 


Theorem 2.1. Assume that there exists K > 0 such that conditions 
(Al),(A2), and (Bl) hold with Wn = o{Mn), then iflogNn{An \ Aq) = 


lim (a, G Ao) = 1. 

n^oo ^ \ / 


The proof of Theorem 2.1 is given in Section 4.1. 
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Note that in the definition of Ao(M„), M„ can be any sequence going 
to infinity. In the examples we have considered in Section 3.1, M„ can be 
chosen to increase to infinity arbitrarily slowly. If £n{^) is (rate) constant 
(2.1) presents no interest since Aq = A„, but if for some X ^ X' the fraction 
en{X)/£n{X') either goes to infinity or to 0, then choosing increasing 
slowly enough to infinity, Theorem 2.1 implies that the MMLE converges 
to a meaningful subset of A„. In particular our results are too crude to be 
informative in the parametric case. Indeed from [20], in the parametric non 
degenerative case £n{X) ^ (log n)/n in definition (2.2) for all A and Aq = 
A. In the parametric degenerative case, where the Aq belongs to the boundary 
of the set A then one would have at the limit 7r(-|Ao) = corresponding to 
£n(Xo) = 0. So we do recover the oracle parametric value of [20]. However 
for the condition logN'„(A„\Ao) = o{nw‘^e‘^ q) to be valid one would require 
X logn, corresponding essentially to Aq being the whole set. 

Using the above theorem, together with [9], we obtain the associated 
posterior concentration rate, controlling uniformly n(d(0o,^) < en|xn,A) 
over A G Aq, with = MnSnfl- To do so we consider the following additional 
assumptions: 

• (Cl) For every C2 > 0 there exists constant > 0 such that for all 
A G Aq and n > N, there exists 0n(A) satisfying 

(2.10) sup [ QL(T„)dn(0|A) < 

AeAo ./e„(A)'= 

• (C2) There exists 0 < ci,C < 1 such that for all A G Aq and all 
9 G 0„(A), there exist tests ^Pn{9) satisfying (2.7) and (2.9), where 
(2.9) is supposed to hold for any u > MMn£n,o for some M > 0. 

• (C3) There exists Cq > 0 such that for all A G Aq, for all 9 G {d{9Q, 9) < 
Mn£n,o} C 0n(A), 

sup d{9,'lpx,X'{9)) < CoMn£n,0- 

pi\,X')<Un 


Corollary 2.1. Assume that Xn G Aq with probability going to 1 un¬ 
der and that assumptions (C1)-(C3) and (Bl) are satisfied, then if 
logN'„(Ao) < O(ne^Q), there exists M > 0 such that 

(2.11) 9 : d{9,9o) > A„) = o(l). 


A consequence of Corollary 2.1 is in terms of frequentist risks of Bayesian 
estimators. Following [4] one can construct an estimator based on the pos¬ 
terior which converges at the posterior concentration rate: d{9, 9 q 
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0{Mn£n,o)- Similar results can also be derived for the posterior mean in 
case d(-, •) is convex and bounded, and (2.11) is of order 0{Mn£n,o), see for 
instance [12]. 

Corollary 2.1 is proved in a similar way to Theorem 1 of [9], apart from 
the lower bound on the marginal likelihood since here we use the nature of 
the MMLE which simplifies the computations. The details are presented in 
Section 4.2. We can refine the condition on tests (C3) by considering slices 
as in [9]. 

Next we provide a lower bound on the contraction rate of the MMLE 
empirical Bayes posterior distribution. For this we have to introduce some 
further assumptions. First of all we extend assumption (2.5) to the set Aq. 
Let e : 0 X 0 —>• M+ be a pseudo-metric and assume that for all A € Aq and 
some dn tending to zero we have 


( 2 . 12 ) 


^OgQ%ni^n) 

sup — -= o(l) 

{||0-0o||<£n(A)}n0„(A) 


neliX) 


ne: 


AeAo -logn(0 : e{e,eo)<26n£n,o\X) 


n,0 


= 0 ( 1 ) 


and consider the modified version of (C3): (C3bis) There exists Cq > 0 such 
that for all A G Aq, for all 9 G {e{9Q, 6) < 6n£n,o} C 0n(A), 


sup d{9,'ipx,y{9)) < Codn£n,o- 

p{X,X')<Un 


Theorem 2.2. Assume that conditions (Al)-(C2) and (CSbis) together 
with assumption (2.12) hold. In case logA'n(Ao) = o(ne^o) ^no > 
mn(logn)/n we get that 

ElU{e : e{9,9o) < 5„e„,o|An,x„) = o(l). 

Typically e(.,.) will be either d{-,-) or || • ||. The lower bound is proved 
using the same argument as the one used to bound ^n(0()|An, x„)^ , see 
Section 4.1 and 4.2, where {d(0,0o) < dn£n,o} plays the same role as 0(). 
We postpone the details of the proof to Section B.7 of the supplementary 
material [25]. 

Theorem 2.1 describes the asymptotic behaviour of the MMLE A^, via the 
oracle set Aq, in other words it minimizes en(A). The use of the Banach norm 
is particularly adapted to the case of priors on parameters 9 = (0i)i6N G ^ 2 , 
where the are assumed independent. This type of priors is studied in 
Section 3.1. 
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2.2. Contraction rate of the hierarchical Bayes posterior. In this section 
we investigate the relation between the MMLE empirical Bayes method and 
the hierarchical Bayes method. We show that under the preceding assump¬ 
tions complemented with not too restrictive conditions on the hyper-prior 
distribution the hierarchical posterior distribution achieves the same conver¬ 
gence rate as the MMLE empirical Bayes posterior. Let us denote by 7r(-) 
the density function of the hyper-prior, then the hierarchical prior takes the 
form 

n(-) = [ n(-|A)it(A)dA. 

Ja 

Note that we integrate here over the whole hyper-parameter space A, not 
over the subset C A used in the MMLE empirical Bayes approach. 

Intuitively to have the same contraction rate one would need that the 
set of probable hyper-parameter values Aq accumulates enough hyper-prior 
mass. Let us introduce a sequence Wn satisfying Wn = o{Mn/\Wn) and denote 
by Ao{wn) the set defined in (2.3) with Wn- 

• (HI) Assume that Aq C Ao(u)n) and for some sufficiently large cq > 0 
there exists > 0 such that for all n > the hyper-prior satisfies 


f Tr{\)dX 
Jao 




and 


n{X)dX < 


(H2) Uniformly over A G Aq and {9 : ||0 — 0o|| < KsniX)} there exists 
C3 > 0 such that 

inf -iniOo) < -c^neniXf] = O(e"”^-.o). 


rxn 




We can then show that the preceding condition is sufficient for giving up¬ 
per and lower bounds for the contraction rate of the hierarchical posterior 
distribution. 


Theorem 2.3. Assume that the conditions of Theorem 2.1 and Corol¬ 
lary 2.1 hold alongside with conditions (HI) and (H2). Then the hierarchical 
posterior achieves the oracle contraction rate (up to a slowly varying term) 

EIU{9 : d{e,eo) > MMnen,o\^n) = o(l). 

Furthermore if condition (2.12) also holds we have that 

E^^n{9 : d{9,eo) < <5„en,o|x„) = o(l). 

The proof of the theorem is given in Section 4.3. 
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3. Application to sequence parameters and histograms. 

3.1. Sequence parameters. In this section we apply Theorem 2.1 and 
Corollary 2.1 to the case of priors on (0, || • ||) = (£ 2 , || • lb)- We endow 
the sequence parameter 6 = ( 6 * 1 , 02 , ■••) with independent product priors of 
the following three types: 

(Tl) Sieve prior : The hyper-parameter of interest is A = A: the truncation: 
For 2 < k, 

6j g{-), if j < k, and 9j = 0 if j > k. 

We assume that f g{x)dx = a < -|-oo for some sq > 0 and 

P* > 1. 

(T2) Scale parameter of a Gaussian process prior: let tj = rj “ and 
A = r with 

6j 1 < j < n, and dj =0 if j > n. 

(T3) Rate parameter : same prior as above but this time A = a. 

Remark 3.1. Alternatively one eould eonsider the priors (T2) and (T3) 
without truncation at level n. The theoretical behaviour of the truncated and 
non-truncated versions of the priors are very similar, however from a prac- 
tieal point of view the truneated priors are arguably more natural. 

In the hierarchical setup with a prior on k, Type (Tl) prior has been 
studied by [1, 27] for generic models, by [23] for density estimation, by 
[2] for Gaussian white noise model and by [21] for inverse problems. Type 
(T2) and (T3) priors have been studied with fixed hyper-parameters by 
[5, 8 , 15, 33, 37] or using a prior on A = r and A = a in [4, 16, 19, 30]. In the 
white noise model, using the explicit expressions of the marginal likelihoods 
and the posterior distributions, [16, 30] have derived posterior concentration 
rates and described quite precisely the behaviours of the MMLE using type 
(T3) and (T2) priors, respectively. 

In the following, n(-jfc) denotes a prior in the form (Tl), while n(-jr, a) 
denotes either (T2) or (T3). 

3.2. Deriving £n{^) for priors (Tl) - (T3). It appears from Theorem 
2.1 that a key quantity to describe the behaviour of the MMLE is £:n(A) 
defined by (2.1). In the following Lemmas we describe en(A) = ^^(A, A) for 
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any K > Q under the three types of priors above and for true parameters 6q 
belonging to either hyper-rectangles 

noo{l3,L) = {6»o = {9o,i)i : < L} 

I 

or Sobolev balls 

CX) 

Sp{L) = {6o = {eo,)r. 

i=l 


Lemma 3.1. Consider priors of type (Tl), with g positive and continuous 
on M and let 0o £ ^ 2 ; then for all K > 0 fixed and if k G {2, ■ ■ ■ , en/ log n}, 
with e > 0 a small enough constant 


OO 

£n{k)‘^ X ^ 0Q j -I- 


k log n 
n 


Moreover if 6q G 'Hoo(/3,L) [JSp{L) with /3 > 0 and L any positive constant, 
(3.1) en,o < (n/ log , 

and there exists 6 q G i L)CSp{L) for which (3.1) is also a lower bound. 


The proof of Lemma 3.1 is postponed to Appendix A.l. We note that it 
is enough in the above Lemma to assume that g is positive and continuous 
over the set {|x| < M} with M > 2||0o||oo- 


Remark 3.2. One can get rid of the logn factor in the rate by allowing 
the density y to depend on n, see for instance [2], [11], These results can 
be recovered (and adapted to the MMLE empirical Bayes case) by a slight 
modification of the proof of Lemma 3.1. 


Priors of type (T2) and (T3) are Gaussian process priors, thus following 
[33], let us introduce the so called concentration function 

(3.2) ipe^{e-a,T)= inf ||/i||^c«.r - log n(|| 6»||2 < eja, r), 

||/i- 0 o|| 2 <£ 

where denotes the Reproducing Kernel Hilbert Space (RKHS) associ¬ 
ated to the Gaussian prior n(-|Q;,r) 

n 

= {9 = (0j)igN; < - 1 - 00 , = 0 for i > n} = M"’, 

i=l 
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with for all 9 € H"’’’ 


i=l 

Then from Lemma 5.3 of [34] 

(3.3) tpBo{K£]a,T) < -logn(|| 6 » - 0q\\ 2 < K£\a,T) < ipQ^{K£l2\a,T) 

We also have that 

(3.4) q ^ {K£/t)-^I^ < - log n(|| 0||2 < K£\a, r) < h {KelrY^I^ , 

for some ci > 1, see for instance Theorem 4 of [17]. This leads to the following 
two lemmas. 


Lemma 3.2. In the case of Type (T2) and (T3) priors, with Oq G 5^(L)U 
nooiYL): 

. If a+ 1/2 


(3.5) 


vm 


+ < en(A) < n 2^+12^+1 _|_ 


/ a{a,/3)\ 2Qi+i'^2 

\ nr^ J 


a + 1/2 a + ll2 

where a{a,/3) = L P /\2a—2/i+l\ if 9 q T) while a{a, fi) — ^ ^ 

if 9o G SpY). The constants depend possibly on K but neither on n,T or a. 

• If (3 = a + 1/2 then 

(3.6) 


\Mi 


■^nr 2 >i+’^ 2 “+i'r 2 “+i < e„(A) < n 2 a+i 7 - 20 + 1 - 1 - 


/log(nr^) 


nr^ 




nr^ 


11, 




where the term log(nr^) can 6e eliminated in the case where Oq G 5^(L). 


Lemma 3.3. In the case of prior type (T2) (with A = t): 
• If a + 1/2 < /3 then for all 6 q G TLosiY L) U Sp{L) 

(3.7) en,0<n-(2“+i)/(4“+4), 


and for all Oq G ^ 2 (.h) satisfying [[^olb > c /or some /ixed c > 0, (3.7) 
is also a lower bound. 


• If a + 1/2 > /3 then 


^n,0 


< ^-/3/(2/3+l)_ 


(3.8) 
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• If a + 1/2 = /3 then 

, . e„,o < if Oq e nooi/^, L), 

en,o < if OoeS/siL), 

and there exists 9 q G Hoo{(^,L) for which the upper bound (3.9) is also 
a lower bound. 

In the case of prior type (T3) (with X = a), 

(3.10) en,o<n-^/(2/i+i)^ -j 0^ ^ S0{L)un^{(3,L). 

We note that for the scaling prior (T2) in the case a + 1/2 < /3 Lemma 
3.3 provides us the sub-optimal rate en,o ^ . Therefore under 

condition (2.12) (verihed in the supplementary material for prior (T2)) in 
all three types of examples studied in this paper (white noise, regression and 
estimation of density models), we get that for all 6 *o 7 ^ 0 with a + 1/2 < /3, 
the type (T2) prior leads to sub-optimal posterior concentration rates (and 
in case 6q G Hoo{(d,L), fl = a + 1/2 as well). 

An important tool to derive posterior concentration rates in the case of 
empirical Bayes procedures is the construction of the change of measure 
V’A,A'- We present in the following section how these changes of measures 
can be constructed in the context of priors (T1)-(T3). 

3.3. Change of measure. In the case of prior (Tl), there is no need to 
construct due to the discrete nature of the hyper-parameter X = k the 
truncation threshold. 

In the case of prior (T2) if r, r' > 0 then dehne for alH G N 

(3.11) V’r.r'(0i) = -0* 

r 

so that ipryiQ) = {'>pT,T'{9i),i G N) = Or' (t and if 0 ~ n(-|r, a), then 
V’r,T'(^) ~ n(-|r',a). 

Similarly, in the case of Type (T3) prior, 

(3.12) '>Pa,a'{ei) = e-"'ei 

so that 'f>a,a'id) = ^ N) and if 0 ~ n(-|T, a), then 'lpa,a'{^) ~ 

n(-|r, o'). Note in particular that if a' > a and < “1"°° hold then 

< 00. This will turn out to be usefull in the sequel. 
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3.4. Choice of the hyper-prior. In this section we give sufficient condi¬ 
tions on the hyper-priors in the case of the prior distribution (T1)-(T3), 
such that condition (HI) is satisfied. The proofs are deferred to Section D 
of the supplementary material [25]. 

Lemma 3.4. In case of prior (Tl) we choose A„ = {2,3, ...,con/logn} 
for some small enough constant cq > 0 and assume that 9q G Sp{L) U 
HooifdjL) for some /3 > /3i > /3o > 0. Then for any hyper-prior satisfying 

(3.13) < it{k) < , 

for some ci,C 2 > 0, assumption (HI) holds. In case the prior has support 
on An the upper bound condition is not needed on tt. 

Note that the Hypergeometric and the Poisson distribution satisfies the 
above conditions. 

Lemma 3.5. Consider the prior (T2) and take A„ = 
for some positive cq > 0 (and cq given in condition (HI)). Then for any 
hyper-prior satisfying 
2 

g-cir for r > 1 with some ci > 0 and C 2 > 1 -|- I/cq, 

_2 

< tt{t) < for r < 1 with some C 2 > 0 and C 4 > I/cq — 1 

assumption (HI) holds. Furthermore the upper bound condition can be re¬ 
moved if the prior has support on An. 

Note that for instance the inverse gamma and Weibull distributions satisfy 
this assumption. 

Remark 3.3. To obtain the polynomial upper bound of the hyper-prior 
densities 7 f(r) in Lemma 3.5 the set An is taken to be larger than it is neces¬ 
sary in the empirical Bayes method to achieve adaptive posterior contraction 
rates, see for instance Propositions 3.2 and 3.4- Nevertheless the conditions 
on the hyper-entropy are still satisfied, i.e. by taking Un = on 

A \ Aq and Un = ri~‘^ (for any d > t)) on Aq we get that logAi„(A„) = 
oiwlnel o) and logN'„(Ao) = 0 ( 716 ^ 0). 

Lemma 3.6. Consider the prior (T3), take A„ = [0, con'll] for some 
positive constants co,ci and assume that Oq G Sg{L) U HooiP, L) some 
/3 > /3o > 0. Then for any hyper-prior satisfying 

g-cza < < g-coaV'^i ^ for a > 0 


(N e 
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and for some co,ci,C2 > 0, assumption (HI) holds. The upper bound on tt 
can he removed by taking the support of the prior to be . 

In the following sections, we prove that in the Gaussian white noise, re¬ 
gression and density estimation models the MMLE empirical Bayes poste¬ 
rior concentration rate is bounded from above by MnSnfl and from below 
by dn£n,o, where is given in Lemma 3.3 under priors (T1)-(T3) and Mn, 
respectively 6n, tends to infinity, respectively 0, arbitrary slowly. 

3.5. Application to the nonparametric regression model. In this section 
we show that our results apply to the nonparametric regression model. We 
consider the fixed design regression problem, where we assume that the 
observations = (xi, X2,..., satisfy 

(3.14) Xi = fofti) + Zi, i = l,2,...,n, 

where Z, A^(0,cr^) random variables (with known cr^ for simplicity) and 
ti = i/n. 

Let us denote by do = (^o,i)^o,2) ••) the Fourier coefficients of the regres¬ 
sion function /q € L 2 (M): fo{t) = so that {ej{.))j is the 

Fourier basis. We note that following from Lemma 1.7 in [32] and Parseval’s 
inequality we have that 

||/o||2 = ||0o||2 = ||/o||n, 

where ||/o||n denotes the L2-nietric associated to the empirical norm. 

First we deal with the random truncation prior (Tl) where applying The¬ 
orem 2.1, Corollary 2.1 and Theorem 2.3 combined with Lemma 3.1 we get 
that both the MMLE empirical Bayes and hierarchical Bayes posteriors are 
rate adaptive (up to a log re factor). The following proposition is proved in 
Section B.l of the supplementary material [25]. 

Proposition 3.1. Assume that fo G 'Hoo{P,L) U Si 3 {L) and consider a 
type (Tl) prior. Let A„ = {2, • • • ,fc„} with kn = ere/ log re for some small 
enough constant e > 0. Then, for any Mn tending to infinity and K > 0 the 
MMLE estimator kn £ Aq = {k : £n{k) < M„en,o} with probability going to 
1 under where £n{k) and Snfl are given in Lemma 3.1. 

Furthermore we also have the following contraction rates: for all 0 < /3i < 
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j32 < +00, uniformly over /3 € (,5i,/32) 


sup f ■. Wfo-f\\ 2 > Mn{n/log n) 

h&-Hoo{.P,L)VjSp{L) ^ 


(3 

2/3+1 



sup E]Jl(f:\\fo-f\\ 2 >Mn{n/\ogn) 2/+1 

/oeWoc(/3,L)US^(L) ^ 



0(1), 

0(1): 


where the latter is satisfied if the hyper prior on k satisfies (3.13). 

Finally we note that the above bounds are sharp in the sense that both the 
MMLE empirical and the hierarchical Bayes posterior contraction rates are 
bounded from below by 5n{n/with Pg^-probability tending to 
one, for any 5n = o(l) and some Oq £ Tdoo{/3,L) U Sfs^L). 


Next we consider the priors (T2) and (T3). As a consequence of Theorem 
2.1, Corollary 2.1, Theorem 2.3, and Lemma 3.3 we can show that both the 
hierarchical Bayes and the MMLE empirical Bayes method for the rescaled 
Gaussian prior (T2) is optimal only in a limited range of regularity classes 
Sj 3 [L) U FLooifi-, L) satisfying /3 < a + 1/2, else the posterior achieves a sub- 
optimal contraction rate . However, by taking the MMLE 

of the regularity hyper-parameter ol or endowing it with a hyper-prior dis¬ 
tribution in the Gaussian prior (T3), the posterior achieves the minimax 
contraction rate Similar results were derived in [30] and [16] in 

the context of the (inverse) Gaussian white noise model using semi-explicit 
computations. We note that our implicit (and general) approach not just 
reproduces the previous findings in the direct (non inverse problem) case, 
but also improves on the posterior contraction rate in case of the prior (T3), 
where in [16] an extra logn factor was present. 


Proposition 3.2. Assume that /o G Sp{L) U'Hoo(/5,L) for some /3 > 0 
and consider type (T2) and (T3) priors with a > 0. Furthermore take 
A„(r) = n"/^] and h.n{a) = (0, con'^^], respectively, for some cq,ci > 

0. Then G Aq with Pj^-probability tending to 1. Furthermore, both in the 
case of the MMLE empirical Bayes and hierarchical Bayes approach we have 
for any going to infinity with hyper-priors satisfying (HI) (see for in¬ 
stance Lemma 3.5 and Lemma 3.6) that 

• For the multiplicative scaling prior (T2) 

— If (3 > a + 1/2, the posterior concentration rate is bounded from 
above by 

MnSnfl X A4n-(2“+b/(4«+4)^ 
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and for 6n = o(l) and II/0II2 > c (for some positive constant c) it 
is bounded from below by 

6 £ 6 -(2a+l)/(4a+4) 

— If (3 < a + 1/2, the posterior concentration rate is bounded by 
MnSnfl < 

with an extra logn term if (3 = a + 1/2 and /o G HoaiP^L)- 
• For the regularity prior (T3) the posterior contraction rate is also 

MnSnfi < 

Proposition 3.2 is proved in Section B.2 of the supplementary material 
[25]. 

Remark 3.4. In fact our results are stronger than the minimax results 
presented in Propositions 3.1 and 3.2. From Theorem 2.1 and Corollary 2.1 
it follows that for both the MMLE empirical Bayes and the hierarchical Bayes 
methods the posterior contracts around the truth for every do G 0 with rate 
Mn£n,o{Go), which is more informative than a statement on the worst case 
scenario over some regularity class, i.e. the minimax result. 


Remark 3.5. We note that in the case of the Gaussian white noise 
model the same posterior contraction rate results (both for the empirical 
Bayes and hierarchical Bayes approaches) hold for the priors (T1)-(T3) as 
in the nonparametric regression model. The proof of this statement can be 
easily derived as a special case of the results on the nonparametric regression, 
see the end of the proofs of Propositions 3.1 and 3.2. 


3.6. Application to density estimation. In this Section we consider the 
density estimation problem on [0,1], i.e. the observations x„ = (xi, • • • ,Xn) 
are independent and identically distributed from a distribution with density 
/ with respect to Lebesgue measure. We consider two families of priors on 
the set of densities = {/ : [0,1] —>■ M+; f(x)dx = 1}. In the first case 
we parameterize the densities as 
(3.15) 


( CXI) 

'^Ojipjix) 

1=1 
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where forms an orthonormal basis with (pQ = \ and 9 = £ ^2- 

Hence (3.15) can be seen either as a log - linear model or as an infinite 
dimensional exponential family, see for instance [35], [33], [22], [23] and [1]. 

In the second we consider random histograms to parameterize J-. 

3.6.1. Log-linear model. We study priors based on the parameteriza¬ 
tion (3.15) and we assume that the true density has the form /o = fe^ 
for some Oq G £2 and throughout the Section we will assume that /o ver¬ 
ifies lllog/olloo < +00 and that 6q G S/siL) for some L > 0. We study 
the MMLE empirical Bayes and hierarchical Bayes methods based on pri¬ 
ors of type (Tl), (T2) and (T3) in this model. We consider the usual 
metric in the context of density estimation, namely the Hellinger metric 

/i(/i,/2)^ = fd(V7i(x) - ^/h{x)fdx. 

First we consider the type (Tl) prior where X = k. We show that Theo¬ 
rems 2.1, 2.3, and Corollary 2.1 can be applied so that the MMLE empirical 
Bayes and hierarchical posterior rates are minimax adaptive over a collection 
of Sobolev classes. 


Proposition 3.3. Assume that 9o G Sis{L) with (3 > 1/2, consider a 
type (Tl) prior, and let An = {2, • • • ,kn} with kn = k^y/n/ logn^. Then, 
for any Ain going to infinity and K > 0, if kn is the MMLE over An, with 
probability going to 1 under PJf^, G Aq = {A:;e„(/c) < MnCnfi}, where 
Enfik) and 6^,0 ore given in Lemma 3.1 and for all 1/2 < fii < ^2 < +00 


sup sup Eg^\u( h{fg^ ,fe) > Mn{n/ log n) 
h&{0iA2)eoeSfiL) ^ 



Similarly in the hierarchical posterior distribution with hyper-prior satisfying 
the conditions of Lemma 3./ also achieves the (nearly) minimax contraction 
rate 


sup sup h{feQ,fg)>Mn(n/ log n) 2/+1 

0&{0lA2)Bo€Sp[L) '' ^ 



Moreover there exists 9 q G Sis{L) for which (i„(n/logn)“^A2/3+i) ^ 

lower bound on the posterior concentration rate for both adaptive Bayesian 
methods. 


The proof of Proposition 3.3 is presented in Section B.3 of the supple¬ 
mentary material [25]. 

We now apply Theorems 2.1, 2.3, and Corollary 2.1 to priors (T2) and 
(T3) and derive similar concentration rates as in the case of the regression 
model. Let 
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Proposition 3.4. Assume that 6q S Sii{L) with fi > 1/2 and consider 
a type (T2) prior with a > l/\/2 and = [Zn^^n)- Then A„ G Aq with 
probability going to 1 under and the same conclusions as in Proposition 
3.2 hold. 

The constraint a > l/\/2 is to ensure that for all/I < a+1/2, 
which corresponds to the minimizer of en('r) (up to a multiplicative constant) 
belongs to the set (T„,fn). 

Proposition 3.5. Assume that 0q G Si3{L) with j3 > 1/2 and consider 
a type (T3) prior with a > 1/2 and A„ = [1/2 + l/n^/'^,A„], with A^ = 
logn/(161oglogn). Then for any Mn going to infinity the MMLE empirical 
Bayes posterior achieves the minimax contraction rate 

MnEnfi < . 

Furthermore the hierarchical posterior also achieves the minimax contraction 
rate for hyper-priors satisfying (HI). 

The proofs of Propositions 3.4 and 3.5 are presented in Sections B.4 and 
B.5 of the supplementary material [25]. 

We now consider the second family of priors. 

3.6.2. Random histograms. In this section we parameterize P using piece- 
wise constant functions, as in [6] for instance. In other words we define 

k k 

(3.16) fe{x) = Ij = {{j-l)/k,j/kf = 1, > 0, 

i=i 1=1 

and we consider a Dirichlet prior on 0 = (0i, • • • ,9k) with parameter (a, • • • , a). 
The hyper-parameter on which maximization is performed is A = A:, as in the 
case of the truncation prior (Tl). We define the sequence en{k) in terms of 
the Bellinger distance, i.e. it satisfies (2.1) with h{fo, fo) replacing ||0 —0o||- 
We then have the following result. 


Proposition 3.6. Assume that /o is continuous and bounded from above 
and below by Cq and cq respectively. If A = {1, • • • , kn}, with k^ = 0{{n/ log n)) 
and if a < A for some constant A independent on k, then for all k ^ A 


b{k)^ + 


k \og{n/k) 


< 


£n{k)‘^ < b{k)‘^ + 


k\ogn 


(3.17) 


n 


n 
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with 



Now suppose that Jq G T-LooiP, L), with L > 0 and /3 G (0,1]. The MMLE 
empirical Bayes posterior achieves the minimax contraction rate (up to a 
logn term), i.e. for all ^ +oo 

MnEnfi < M„(n/log 


and 

n (h{fo, fe) < MnEnfil^n, kj = 1 + Op(l). 

Equation (3.17) of Proposition 3.6 is proved in Appendix A.4, while the 
rest of the proof is given in Section B.6 of the supplementary material [25]. 

4. Proofs. 

4.1. Proof of Theorem 2.1. Following from the definition of An given in 

(1.1) we have that m(xn|A) < m(xn|An) for all A G A^. Therefore to prove 
our statement it is sufficient to show that with -probability tending to 
one we have 


sup m(x„|A) < m(xn|Ao) < sup m(xn|A), 

AeA7t,\Ao AeAq 

where Aq is some hyper-parameter belonging to Aq (possibly dependent on 
n). 

We proceed in two steps. First we show that there exists a constant C > 0 
such that with -probability tending to one we have 

(4.1) m(x„|Ao) > 

Then we finish the proof by showing that for any sequence w'^ = o(M^ Arr^) 
going to infinity 

(4.2) Pl( sup m(xJA)>e-"«o] =0(1). 

\AeA„\Ao J 

We prove the first inequality (4.1) using the standard technique for lower 
bounds of the likelihood ratio (e.g. Lemma 10 of [13]). Without loss of gen¬ 
erality we can assume that there exists A G A„ such that ^^(A) > Enfl- 
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Then take an arbitrary Aq G Aq such that en(Ao) < MiSn,o for an arbi¬ 
trary Ml > 1. Then we have from the assumption (Bl) and the definition of 
en(A) given in (2.2) that with -probability tending to one the following 
inequality holds 

m(x„|Ao) > [ 

J d&Bn{dQ ,M2£n0'0) ,‘i) 

> n(P„(0o,M2en(Ao),2)| Ao) 


We now prove (4.2). Split A„\Ao into balls of size Un/2 and choose for each 
ball a point in A„ \ Aq. We denote by these points. Consider 

the set 0n(Ai) defined in (2.6) and divide it into sieves 

5®. = {0 G 0„(Ai); jen(Ai)c(Ai) < d(0,0o) < (j + l)£n(Ai)c(Ai)}. 

We have following from assumption (2.9) that for all j 

(4.4) logiV(Oen(Ai)c(Ai),S'|^*].,d(-,-)) < cin/en(Aj)^c(Ai)^/2 

and constructing a net of with radius Cj^n(,K)c{\) we have following 
from assumption (2.7) that there exist tests satisfying 


(4.5) 


I,., 


P (Al) 

PPj)me\\,) 


<; g-cinpe„(Ai)2c(Ai)^ 

< |Ai). 


(i) 

Let us take the test ipn,i = maxj ■ and for convenience introduce the 
notation Bn{X) = 0„(A)n{0 : ||0 —0o|| < A'e„(A)}. Then using the chaining 
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argument, Markov’s inequality, Fubini’s theorem and (2.7) we get that 
(4.6) 

P^l sup m(x„|A)>e-"«o) 

\AeA„\Ao J 

N„{A„\Ao) 


4Vn(An\Ao) / \ 

- X] ( sup m(x„|A) > j 


Nr,(An\Ao) 


2 = 1 
Y„(A„\Ao) 



sup 


Y„(A„\Ao) 


E 

sup 

2=1 

V(Ai,A)< 

Y„(A„\Ao) 


E ^.1 

sup 

2=1 



/ 

J Xhy 


JAe)-ir,{&0 


)dn(0|A)) 




(1 9^72,2 


oU0)-^r. 


< Nn{An \ 

Y„(A„\Ao) 

E / Qi,n{A^n)dIl{9\\,) 

i=\ Bn{Xi) 

iVn(A^\Ao) 

+ E / Q^,^,Jl-99n,*)dn(0|A, 

^ 70„(Ai)nB„(Ai)'= 

Y„(A„\Ao) 

+ E / QLn('^n)cin(0|A,)|. 


Next we deal with each term on the right hand side of (4.6) separately and 
show that all of them tend to zero. One can easily see that since A* G A„\Ao 
and following the definition of c(Aj) given below (2.8), we have that 


Nn{An \ Ao)e-('^l/2)nmLs4Ad^c(Ad2 < \ = o(l). 


For the second term we have following from assumption (2.5), the defini- 
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tions of en(A) and the set Aq given in (2.2) and (2.3), respectively, that 
N„(A„\Ao) 


N„(A„\Ao) 

< ^ e”^"<oe°(i)"4(Adn(B„(Ai)|Ai) 

i=l 

< g-«Af2£2_(j(c-l+o(l)) ^ 


L 


Bn(Ai) 


Q%^^{Xn)dUie\Xi 


Next following from (4.5) we have that 

Nn[An\AQ) 

e»«. ^ / (3l,„(i - ^„)<in(()|A,) 

1=^ (Aj)n.Sn 

iVn(An\Ao) 

< g^«'n4,o ^ g-cin£„(Ai)2c(Ai)^ 

i=l 

< g-cin«)2e2 j^(i+o(i)) ^ 


Finally we have following assumption (2.6) that the fourth term on the right 
hand side of (4.6) can be bounded from above by 




N„(A„\Ao) 


° E / Qi n{X^"^)d^{0\K) < Nn{K \ Ao)e-(“''-<Xo 

^0n(Ad" 

< g-"’«'n4,o(l+o(l)) = 


2 = 1 


4.2. Proof of Corollary 2.1. The proof of Corollary 2.1, follows the same 
lines of reasoning as Theorem 1 in [9], with the adding remark that 

m(x„|An) > r?T,(x„|A), VA G A„, 


so that no uniform lower bound in the form inf;vgAo "i(xn|A) is required. We 
have 


Eg^Il (^d{6, Oq) > MM„e„,o|x„; 

^ ( Sdie,eo)>MMr,er,,o ^"^^°^dn((9|A„) \ ^ f HnjXn) \ 

y /0e^"(^)-^"(®o)dn(6'|A„) ) ym(x„|A„)y 

We construct ipn = max>^, maxj max; 99 ^*^^^^), with (Ai)j< 7 v„(Ao) ^ 

Ao with radius Un, and for all j > MMn, idj,i)i<N„j a CJ^nA) net of 
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Sn,j = < d{e,eo) < (j + l)en,o} H 0n(Ai). By assumption (C 2 ), 

log Nnj < cinj^e^/2 and logA^n(Ao) < c^ne^Q (for some C 3 > 0). Then we 
have for any C 2 > 0 


(4.7) 




HnjXn) 

m{Xn\Xn) 


— ^ Aq) + Eg {(fn) + P 0 ob^i^ri\Xn) < 6 




(1 - (fn) sup Hn{X) 

AeAo 


We assumed that the hrst term tends to zero (see Theorem 2.1 for verification 
of this condition in case of MMLE). Furthermore by construction 


Eg^ i^Pn) < A^n(Ao) SUp ^ 

* i>MM„ 


gCinfel{\i)/2^-cinf£l{\i) < 


g-nciM2e2 g/4^ 


Also 


P^^[m{^n\Xn) < e-'=^<0] < p-[m(x„|Ao) < = o(l) 

following from (4.1) with C 2 > C 3 + M\{ag + 2M| + 2). The control of the 
last term of (4.7) follows from the proof of Theorem 1 of [9]. 

4.3. Proof of Theorem 2.3. As a first step for notational convenience let 
us denote by the sets {9 : d{6, 6 q) > MMnSnfi} or {9 : d{9, 9 q) < 6nSn,o} 

n(P^|x„) = f n(P^|x„, A) 7 r(A|x„)(iA + [ n(P^|x„, A) 7 r(A|x„)(iA 

J Ao{Mn} j KQ(MnY 

(4.8) < sup n(P^|x„,A)+ [ 7 r(A|x„)dA. 

XeAoiMr,) JKo{Mr,Y 

Then from the proofs of Theorem 1 of [9] and Theorem 2.2 follows that 
the expected value of the Hrst term on the right hand side of the preceding 
display tends to zero. We note that assumption (H2) is needed to deal with 
the denominator in the posterior, unlike in Corollary 2.1, where weaker 
assumptions were sufficient following from the definition of the maximum 
marginal likelihood estimator Xn- 

Hence it remained to deal with the second term on the right hand side of 

(4.8) . The hyper-posterior takes the form 


7 r(A|x„) oc m(x„|A) 7 r(A) 
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and from the proof of Theorem 1 of [9] (page 10-11) and (4.6) in the proof 
of Theorem 2.1 we have with -probability tending to one that 

m(x„|A) > for A € Aoiwn), and 

m{-Kn\X) < for A G \ Ao(M„), 


for any w'^ = o(M^ A w'^), hence there exists w'^, which also satisfies Wn = 
o{w'^). Therefore with -probability tending to one we also have that 


' A7i\Ao {Ain') 


7r(A|x„)(iA < 


' 71,0 


,-(co-|-2M|)w)2n£2 


° ko{w„) 


= 0 ( 1 ). 


Finally similarly to the preceding display we have that 


lA\Ar.Elm{^n\XMX)dX 

Ea. / 7r(A|x„)dA < . -—— + o(l) 


^^0 


^A\Ar, 


-{co+2Ml)wlnel f^ r 

Jj 


< p(co+2M|+l)ti2„4,o 


Ao(<in) 


lA\An 


n{X)dX 
fr{X)dX + o(l) = o(l), 


finishing the proof. 


APPENDIX A: PROOF OF THE LEMMAS ABOUT THE RATE sn{X) 

A.l. Proof of Lemma 3.1. We have ||0 — ^olli = Sj=i(^i “ ^oy)^ + 

Y.'jLk+i^h so that \\9 - OoWl ^ if and only if Ej=i(^i “ ^Oj)^ = 

11^ - with 6^ = PV - E^fc-ri^o,A ^o,[fc] = {GojJ < k)- 

Then 



- ^'oJfcllb < d}d9 < 


IT 






’°r(A:/2 + l) 

^k/2^k 


T{k/2 + l) 


with g = inf£^( 5 ) g{x) where Bk{5) = {x;minj<fc \x — 0o,i| < The Ster¬ 
ling formula implies that both the lower and upper bounds have the form 
exp{/c log(C'(5/\/fc)} and since 6 = o(l) this is equivalent to 
exp{A;log((5/-\/A:)(l -|-o(l))}. We thus have 


^niA) P 



and 


nslik) = klog{Vk/ Sn){l + o(l)), 
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with = K‘^£^{k) — Yl’^=k+i ^o,j- other words > 0 and 
(A.l) 4 + £ ^o,j = ^ log f—) (1 + o(l)). 

j=k+i " V / 

Also if Yl^=k+i ^Oj ~ o{k\ogn/n), then (A.l) implies that 

4 = ^^log (l + o(l)) ^ sl = ^^log{2n/K‘^){l + o{l)). 

Now take Oq G T-LooiPjL) U 5^(L), since Yli>k^oi ^ k~'^^, choosing k = 
[(n/logn)^/l^^+^lj leads to e„_o ^ (n/logn) . Finally considering 

= (1 + for implies that this is also a lower bound in 

this case. Furthermore for all 6n = o(l/M„) and for all k such that 

f^-2/3 felog^ ^ M^(n/log ^ A: < M^(n/log 

n 

and 5^ {k~‘^^ + /clogn/n) = o{k~‘^^) = J so that 

U{\\e - OoW < 5nen{k)\k) = 0 
and condition (2.12) is verified. 


A.2. Proof of Lemma 3.2. We need to study 


inf 

||h-0ol|2<£n 




Let us distinguish three cases /3>a + l/2, /3<a + l/2 and /3 = a + 1/2, 
and note that the following computations hold both for the truncated and 
non-truncated versions of the priors (T2) and (T3). 

In the case f3 > a + 1/2 and if ^ for all i, then 


inf 

||/i-0o||2<e 


< r 


-2 


Lt 


-2 


T ,-2a-2/3 ^ 

^ ~/3-a-l/2 

2 = 1 ' 


while when Oq G Sp{L) inf,,gH“.-: ||/i- 0 o|| 2 <£ II^IIhi“>- ^ ^Iso 
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if 9q e 7iooif3,L), while 


___ ___ 1 \l /2 

2a + 1^2a + l < £^(^a,T) < U 2a + l ^ 2a + l _|_ j __ 


nr^ 


if 00 £ Sp{L). Now, if 0 < /3 < a + 1/2, with 0o £ T~Loo{l3,L) 


V 2 ^ / 


inf 

hgH“'’': \\h—9Q\\2<e 

and when 0o £ Sp{L) 
inf 




2 a+i r ^6 


2Q-2/3 + 1 


i=l 


2a + 1 - 2/3 


L,. < r-2L^e-(2“-2/3+i)//3. 


IH“'^ — 


||/ i -6» o ||2<£ 

If /3 = a + 1/2, the same result holds for 0o € <Si 3 {L), but it becomes 

r-^L. 


inf 

||/i—0o||2<e 


< 




-|log(e)|(l + o(l)). 


when 00 £ T~ioo{P-,L) . These lead to the upper bound in (3.5) and (3.6). 

Furthermore for every 0o G Sis{L)[J7iooiP, L) satisfying ||0o||2 > 2e, when 
\\h - 0 o ||2 < £ then \\h \\2 > || 0 o|| 2 / 2 , hence 


, „ , l|h||^.,. >r-\^ inf llhlll > ||0o||ir-2. 

/igH“.^: ||/i-6»o|| 2<£ /igH“.^: ||/i-0ol|2<£ 


inf 


Hence if ||0o||2 > 2gn(Q:,r) for a(a,/3) dehned in Lemma 3.2, 

|| 0 o ||2 ^_a/( 2 a+l)^l/( 2 a+l) 


^n('^) 


> 


Vnr'^ 


and for all r^n lower bounded by a positive constant the above inequality 
remains valid when ||0o||2 < 2e„(A), providing us the lower bound in (3.5) 
and (3.6). 

A.3. Proof of Lemma 3.3. The proof is based on minimizing the 
upper bounds obtained in Lemmas 3.1 and 3.2. 

• First consider A = r. When /3 > a + 1/2, note that for all r > 


1 


nr^ 


1/2 


< ^-a/(2a+l)^l/(2a+l) 
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so that e„(r) x 7 i-"/( 2 «+i) 7 -i/( 2 a+i) which is minimized at r x 
so that (3.7) is verified. Following from (3.5) the lower bound is obtained 
with every ||0o||2 > c > 0, for any arbitrary positive constant c. Indeed in 
this case, we have £niT) > (nr^)"^/^ which implies that the lower bound 
is the same as the upper bound (3.7). Furthermore we note that the lower 
bound 


(A.2) 


> (2«+l)/(4a+4) 

^n ,0 gj 


holds for every 6q ^ 0 (and large enough n). Therefore we also have for 
every tq satisfying Eniro) < en,o that tq > 

When (3 < a + 1/2 we have for all r > that Enir) x 

n~ 2 a+i 7 - 2 q+i , which is minimized at r x (2/3+1) ^ leading to the 

upper bound (3.8). The upper bound is obtained choosing for instance 
00 ,i = for all i < Kn, for some sequence going to infinity, 

so that 


-^n 

^ ^ - 20o,i(0o,i - hi)] 

\\h-e\\2<£niT) ^ 

> r-2 [LKl^-^h+i _ 2VLEn{T)Kl^-h+^) 

> T--2 T^2a-2p+l 

' -‘'■n 

and Kn < koEniT)~^^h_ xhis leads to Enir) > with an extra 

logn term in the case a + 1/2 = j3 and 0o £ ’^oo(/3,T) so that the lower 
bound is of the same order as the upper bound (3.6) which in terms implies 
that the lower bound is the same as the upper bound (3.8). 

We now consider the case A = a, then we have a generic upper bound for 
En{oi) in the form n“*'"^^h(2«+i) following from (3.5) and 9q G 7ioo{P,L) U 
Sj 3 {L), while the lower bound is a multiple of We thus have 

£n,o ^ for all 0o S 77 cxd(/3,T) U Sis{L) and the constant depends 

only on (3 and L. 

A.4. Proof of Equation (3.17) in Proposition 3.6. We prove the 
first part of proposition, namely the bounds on En{k). Denote by qq the 
function 

k 

go{x) = k'^fij'W.j^ix), 
i=i 
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then qq is the projection of ^/Jo on the set of piecewise constant functions 
on a A: regular grid and for any 9 £ the fc-dimensional simplex, 


h^{kJe) = gl) + > h'^{fo,9o) = 

i=i 


Define 9j^k = / Yli foi’ some Vn = o(l) consider 9 = {9i,9^) £ 

Sk satisfying \9j-9jk < Gj^kVn for j < A:-l. Then |6»fc-4,fc| < Oj,kVn < 

Vn- Note that b{k)‘^ = 1 — Yl^j=i so that 


= Y 

i=i i=i 


9j — \l 9i 


lYvfk 


— ^ ^ "Yj 


lYkfk-lf 


i=i 

,2 


i=i 


< 2< + 2b{ky 


which implies that for such 9, h‘^{fo, fe) < ^b{ky + 2u^. Since cq < /o < Cq, 
Co/A: < 9j^k Z Co/k and we also have, as in the proof of Lemma 6.1 of [12], 
that if Vn < Co/(2A:), then Vn < 9k,kl‘^ and 


^ (l^i “ ^j,k\ < 0j,kVn, '^j <k - l) 


n.i 


^i,k (1+^n) 


> 


> 


iCiVn)'^r{ka) 


x^-^dx 


n 


{C2Vn)'‘T{ka)k 


—ka 


(ar(Q:))^ ^r(a) ’ 
for some constant Ci,C 2 > 0. Since a < A, if = n~^ for some /i > 0, 
vr {\0j - Ojk < ^j,kVn, yj < k - l) > 

which implies that for all k such that b{ky < A:logn/n we have £n{ky < 
b{ky + klogn/n. We now bound from below £n{k). Since h‘^{fo, fe) = h{k)‘^ + 


Ej=i Y%,fc\/1 - 6(A:)2j , on the set h^ifoje) < 4: Hk) < 4 

and “ V ~ h{k)‘^\ < e^. Using elementary algebra and 
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Cauchy-Schwarz inequality we have if is small, b{k) is small and 




so that if h{fo, fe) < small enough, then 

2 



Hence 


< Ksnikf} < n(^ ( < KSnikf - b{kf), 

i=i ^ ' 

with b{k)‘^<KEnik)"^. Set = Ken(k)‘^ — b{k)'^. On the set 



< s 


2 

n’ 


we split {I,--- ,k — 1} into \^/0~j — < ^iVk and \\/0~j — > 

l/'/k. The cardinality of the latter is bounded from above by s'^k. Moreover 
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if \\/^ ~ < ^iVk then by triangle inequality ^JWj < IjVk else 

1/^ ~ 'Sn- We have 

U( - 0 ^^^? < s'^] < fk\^a-l)lj^-(k-l){a-l/2) 

^j,k) <^n) < r(a)fcr(A:/2 + l) ^ VV " 


< 


7r2r(aA:)s 


k 

n _/ u-k(a-l/2) 

+ l)\ 


r(a)^r(A :/2 + 1 ) 

^ ^gUog(fc)+2/-(fc-Z)(a-l/2) log(fc)+2i(a-l/2) log(sn) 


l<slk 


k 1 

< exp <j ak log(A:) — k log r(a) — — log(A:) + k log(s„) — k{a — -) log k + 0 {k) 


{ 

< exp(A:log(sn) + 0 {k)) 

if a > 1 / 2 . If a < 1 / 2 , for each 0 split {I,-- - ,/c — 1 } into the set S 
of indices where 6 i > Pn/k and its complement, with pn = o(l)- The 
number of indices such that 6 i < Pnjk is bounded by 0 {s‘^k) on the set 


“ V ) < Sn! so that 


^ r(fcoi) f 


r(a)^ 


If Vies T\0r"d9, [ llvies^ IT 

a \ Pn •*■■*• ! Q ^ Pn •*• ■*• 

,cQ Wj<-^jg 5 c 


< 


< 


T{ka) 

r(a)^ 


E 


U Vies H' 


- 0 ,J 


^dui 


T{ka) 


r(a) 


E 


l>k{l-sl) 


T{l /2 + l) \l 


a^e 


l log^ *° ^ log l—{k—l) log{k—l)+0{k) 


<-^^p^(pjkr E 

(aria))'' ^ 

< exp{fcalog(p„) + k\og{Sn/y/p^) + 0 {k)} < e*:l°g«>^-fc(l/ 2 -a) logp„+ 0 (fc)_ 
Hence, choosing | log pn\ = o(| log s^l) leads to 

k / ,_X 2 

fc(l+o(l)) log s„ 


nlEIVS 

vi=i 


k] < Sn < e 
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so that s^l logSnl > k/n and > k/n\og{n/k). 
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SUPPLEMENTARY MATERIAL 

Asymptotic behaviour of the empirical Bayes posteriors associ¬ 
ated to maximum marginal likelihood estimator: supplementary 
material 

(). This is the supplementary material associated to the paper Rousseau 
and Szabo [24]. We provide here the proofs of Propositions 3.1-3.6, together 
with some technical Lemmas used in the context of priors (T2) and (T3) 
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and some technical Lemmas used in the study of the hierarchical Bayes pos¬ 
teriors. Finally some Lemmas used in the regression and density estimation 
problems are given. 

B. Proof of the Propositions. 

B.l. Proof of Proposition 3.1. It is sufficient to prove that all conditions 
of Theorems 2.1, 2.2, 2.3, and Corollary 2.1 hold, since then the Proposition 
follows from the combination of them with Lemmas 3.3 and 3.5. 

As a hrst step we note that since there are only finite many truncation 
parameters (|A„| = o(re)) there is no need to introduce a change of mea¬ 
sures '4’k,k'-i one can simply take = Pe- Furthermore, we also have from 
NniAn) = o{n) and Q ^ ^nlogn that log A'„(A„) < logn = o(ne^ o)- 

Next we dehne for all k < en/logn, with e > 0 hxed but arbitrarily 
small, the set Qnik) = {0 £ M^;maxj|0j| < so that the 

exponential moment condition on g implies that 

UiQnikrik) < ke-^"^<0^ if ^2 < ^^^2^ 
and condition (2.6) holds. Furthermore, following from Lemma 3.1 and 
log N{C£n{k), Qnik), II • II2) < A: logn, 

for every C G (0,1), there exists a large enough constant c{k) = K such 
that the entropy is bounded from above by c(/c)^nen(A:)^/4. We note that by 
slicing up the set Qnik), see for instance the proof of Proposition 3.3, the 
upper bound on the entropy would hold for any c(A:) = AT > 0. 

From [1] we have that 

2Ki6o,9) = V2i9o,9) = n\\fe, - feg = n\\e - Bog 

so that (Bl) holds with M 2 = 1 and Aq = {kn} where kn G {snik) < 
A'henfi}. Then conditions (A2), (Cl)—((73) follow from [13] with dnife, foo) = 
Wfe — fooWn the empirical L2-distance, which is also equal to the £2 norm 
11^ ~ 9o\\2 = 11/0 — /0OII2 (from Parseval inequality). Finally condition (2.12) 
is proved in Lemma C.2 and (H2) in Lemma E.4. 

B.2. Proof of Proposition 3.2. Similarly to Proposition 3.1 it is sufficient 
to verify that all the conditions of Theorems 2.1, 2.2, 2.3, and Corollary 2.1 
hold. 

Take Un < n“^/logn for A = a and Un ^ for A = r. Since 

ne^Q > TTirilogn and NniAn) < for some H > 0, logA'„(A„) = o(ne^Q). 
Furthermore condition (Bl) follows from Proposition 1 of [1] with M 2 = 1. 
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The proof of conditions (Al) and (A2) are given in Lemma C.l, Lemma 
E.3, and Lemma E.l with ci = 1/2, = 1/18, c(A)^ = > IQjx/ci (where 

/i is dehned in Lemma E.l), and d{ 6 i, 62 ) = ||0i — 02||2- Condition (H2) holds 
following from Lemma E.4 with C3 = 2 + j^. Finally for Corollary 

2.1 conditions (C1)-(C2) follow again from the preceding lemmas with M > 
10^/^y^, C2 = /i, since Wn£n,o = o(en(A)) for all A G A„\ Aq. Note also that 
from the proof of Lemma E.l we also have for Un < that \\0 — II2 = 
o{n~^) = o{en,o), for every - 6»o|| = 0(1). 

The lower bound in the case a + 1/2 < /3 follows from Theorem 2.2 and 
Lemma 3.4, since condition (2.12) is proved in Lemmas C.2 and E.l. 

Finally we note that the same results hold for the Gaussian white noise 
model as well. The proof can be easily derived from the proof on the re¬ 
gression model, by substituting ej{ti) by 5o{i — j) (where Jq is the Dirac- 
delta measure) in Lemmas E.3 and E.l and taking = 1/n (in this case 
C3 = 2 -|- 3iL^/2). Furthermore one can choose C = ci = 1/2 in the testing 
assumption (A2) by using the likelihood ratio test in the Gaussian white 
noise model, see for instance Lemma 5 of [13]. 

B.3. Proof of Proposition 3.3. The proof consists in showing that as¬ 
sumptions (Al), (A2bis), (Bl) and (C1)-(C3) are verified. 

In the case of prior (Tl), there is no need to consider a change of measure 
since A is hnite, so that Nn{An) = o(n). Then similarly to the proof of 
Proposition 3.1 we have that logA'n(A„) = o(ree^Q). 

We hrst prove (Bl), or more precisely the variation of (Bl) given in Re¬ 
mark 2.2. Choose ko G Aq which verihes en,o < en(^o) < for some 

Ml > 1. We have for all k and all 9 £ that ||0||i < Vk\\9 — 9q\\2 -|- ||^o||i- 
Now let 9q G 'Hooi/3,L) U 5/3(L) with f3 > Ijl, then jj^olli < + 00 , and if 
/co G Aq satisfies en,o < £n(^o) < for some Mi > 1, then 

£n{ko) < V /cq \/koen{ko) = 0(1), 

Vn 

so that 

{\\ 9 -e 42 <Ken{kii)](l{\\e\\i<M], 

if M is large enough. Moreover, using Lemma E.l, for all M > 0, 

{||0 - 0OII2 < Ksniku)} n {||0||i < M} c B{eo,M 2 en{kii), 2), 

and (Bl) is verihed. 

We now verify assumption (Al). We have Qkn~fe fo^ ^ ^ thus 
(2.5) is obvious and (2.6) follows from [23], (verification of condition A), 
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with 

Qnik) = {9e II0II2 < Rn{k)}, Rn{k) = Ro{nen{kff/P\ 

for some Rq > Q large enough. Similarly the tests in (A2) are the Hellinger 
tests as in [12] so that (2.7) is satisfied. 

We now study the change of distance condition of the version (A2bis) 
of condition (A2). Define Bnj{k) = {0 £ &n{k)', ||0 — ^olb G {j^n{k),{j + 
l)e„(/c))} for j > K and let 0 £ Bnj{k). Since H^olb < +00, Bnj{k) / 0 only 
if j < 2Rn{k)/£n{k). Note also that \/ken{k) < V k^J\ogn/n < 1. 

For all j < jQ{y/k£n{k))~^ with jo > 0 we have ||0 —0o||i < Vk£n{k){j + 1) < 
jo + 1. Using Lemma F.l in the Appendix, we obtain that 

d{h,fe) > - 0OII2 > 

So that c{k,j) = Moreover using [23], p. 8 

d{fe, fe') < p - d'h < ||0 - e'\P 

so that if 110 - 0'||2 < d{fe.,fe') < as 

soon as k or jo is large enough. Thus 

\ogN{(:c{k,j)£n{k),BnPk),d{;-)) < \og N [Ce-^'^dB+D , B^^k) ^ • lb) 

<k = o{n£l{k)). 

Hence, for n large enough we have for k € An \Aq 
(B.l) 

g-cinc(fc,j)2e„(fc)2/2 g-cie =lGo+l)ne„(fc)2/4 ^ ), 

K<j<jo/{Vken(k)) 

as soon as Wn = o{Mn)- Now consider j > jo{y/k£n{k))~^ and let 9 £ 
Bn,j{k), from equation (16) in the proof of Lemma 3.1 of [23], 

d{fo, fe) > P - 9o\\2 {yk£n{k)j + | log(jen(A:))|) . 

For all j > \og{k)/{^/k£n{k)) we have Vk£n{k)j > \ log(je„(A:))| and 

d{fo, fe) ^ 

when n is large enough and we can choose c{k,j) = ck~^^‘^£n{k)~^. For all 
9,9' £ Bn,j{k), using equation (8) of [23] 


(B.2) 


d{fe,fe')<^P-9'\\2, 
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so that there exists c > 0 


logN{C^-^,BnAk),d{;-))<logN{c"-^,BnAk),\\ • II2) 


< 


k log{jk) = o{n/k) 


for all j < Rn{k)/en{k) and for all Ci,C 2 > 0 
(B.3) 

[R„{k)/en{k)] T2 (l,\ n 

g-C'2n/A: ^ ^-C- 2 .nlk —!i_ < g-^-^ ji large enough. 

j=\Ci log k/(Vk£n{k))'\ 


£n{k) 


n. 


Combining (B.3) with 

n^/(^^’''^)(logre)^^/( 2 ^+i) > ree^ 0) ^ > \/n, V/c < 

K 

implies that 

LRn(fc)/£n(fc)J 

(B.4) ^ g-C2n/fc ^ 

j=\Cl log fc/(\/fc£„(fc))] 

when w'^ = with 0 < 5 < (3 — 1/2. 

We now consider jo/{Vk£n{k)} < j < 5\og{k)/{y/k£n{k)} with 5 arbi¬ 
trarily small. Then 

j£n{k) 


d{fo, fe) > ll^' - 6 »o ||2 (I log(je„(A:))|) 


-1 


> 

logn 


so that c{k,j) > j/logn. Note also that, similarly to before, this implies 
that d{fo,f 0 ) > dn£n,o as soon as k < ko{n /for all ko > 0 
and n large enough. Using (B.2), log , Bnj{k),d{-, •)) < /clog(/i:). 

Moreover 


nc{k,jYen{k) 


2^ n ..\2 ^ nsnikff ^ 


> 

('N_/ 




(logn)^ (logn)2A: 

for all j > jo/{Vk£n{k)). By choosing jo large enough, we thus have that 
for ko fixed and all k < ko\/n{\ogn)~^, 

logiV(C^^^,^n,j(fc),d(-,-)) < cinc{k,jf£n{kf/2. 


We also have that 


(B.5) 


[Cl logfc/(vTen(fc))l+l 

E 

j=\3o{'/k£n(k))-^'\ 


C 2 n- 

e 2^ < o(e 
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Combining (B.l), (B.4) and (B.5), we finally prove (A2bis). 

We now verify conditions (C1)-(C3) to obtain the posterior concentration 
rate. We already know from Lemma 3.1 that ^ (^/ log where 

the constant depends only on L,/3i, and (32 if 9o G . Since we do not 

need the change of measures V'aac (C'l) and (C2) are proved in [23]. 

Finally for the lower bound on the contraction rate condition (2.12), take 
G 'H{I3,L), so that (n/logn)“^/(^^+^) and if ||6< —0o||2 < 

M6en,o with 0 G and M > 0 then k > 5^ log The above 

computations imply also that if there exists k < 5n {nj log then 

d{fo,fe) > ll^* - ^olb on the set {d{fo,fd) < SnEnfi} so that Lemma 3.1 
implies condition (2.12). 

For the hierarchical Bayes result assumption (H2) is verified in Lemma 

F.3. 

B.4. Proof of Proposition 3.4- To prove Proposition 3.4 we need to verify 
that (Al)-(A2) and (Bl) are satisfied, together with (C1)-(C3), (2.12) and 
(H2). Let To G Aq satisfying MiSnfi > en('ro)- Equation (F.4) in Lemma F.2, 
with Kn = implies that for M > ||0o||i) 

n(|| 0-0o||2 <i^en(ro);||0-0o||i < A:v^e„(ro) + 2M|a,ro) > 

Moreover y/KnSniTo) < (r^ne^ Q)^/l^“le„(ro) and using Lemmas 3.2 and 3.3 
if /3 < a + 1/2, 

G'L^n{To) < + 

(2P-l)(2a + l) 

= n 40(2/3 + 1) = 

Similarly if /3 > a + 1/2, 

pKne„(r,) < „VI++1)H-(2„+1)/(4„+4)^1/(2+_ 

V^^niro) < n 4(0+1) = o(l). 

So that for n large enough, 

n (||0 - Ooh < Ksniro); p - 0o||i < 3M|a,ro) > e-”'^'<o/2 

and using the same computations as in the verihcation of (Bl) in Section 
B.3 we obtain 


{P - Boh < Kenpo)} n {||0||i < M} c B{eo,M 2 en{To), 2 ). 
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We now verify (Al), (A2), (C1)-(C3). Consider the transformation defined 
in (3.11). Then 


(B.6) 


< — -c(6») I , 


if t' > T. 


0 n(r) = {e = e„(r) 0 i + f?„(r)02, S Bi, 02 G H[} n {|| 0 i||i < 
with i2„(r),Bi,]HI]( defined in Lemma C.l. Lemma C.l implies that 
(B.7) n(0„(T)"|a,T) < 


For all 6 G ©^(t), 


||0||i < en(T)||0i||i+i?n(T)||02||i < \/nen('r)+i2n(T)r(||02 ||h-+ « < C{T)y/nen{T), 

so that if T < t' < r(l + Un) with Un = 

(B.8) < (l + o(l))/,-(xJ 


and ri^^) < 2 for n large enough, and condition (2.5) in (Al) is satished 
with Tj = T„(l+u,i)* the smallest point in the (z+l)th bin [t^(1+u„)*, r„(l+ 
UnY^^] on A„. Using (B.8), we also have that 

[ g®„(A")dn(0|r) < / 

J&n(rY J^n(rY 


< n(0„(T)‘=|a,r)^/2 (^j e2”“'‘ll^«®*^ll°°(in(0|r)^ 

< g-cn4U)/2 JJ 

i 


if Un = o(n“^T“^), and condition (2.6) is verified. 

Similarly to the case of prior (Tl) the tests in condition (A2) are the 
Bellinger tests as in [12] so that (2.7) is satisfied using (B.8). We now verify 
(2.8). Recall that for all 0o € Sp{L) U 'Hcxd(/ 3,L) with L > 0 and /3 > 1/2, 
||0o||i < +00. From Lemma F.l, 

4/o,/ 0)> l|0-0o||2e-''ll®-^°llL 
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Define_0„(r) = 0n(r) n {H^ - 0o||2 > Kenir); \\9 - 6»o||i < y/IQ\\0 - 9o\\2 + 

Af} n 0n(T) with M > 2||0o||i and 

0 n('^) = + -RnHl 

with Un = rCa AM and is the Li unit ball and 

Rn ^ nel Qwl/'^ if TC'a^““^'^^^(ne2 oU;^)-("-V2) < ^ gjgg C'a'^^(^)“2(a-l/2) . 

From (B.7), (C.2) and Lemma F.2 

(B.9) n(0^|r) < 


where A can be chosen as large as need be by choosing Cq, large enough. 
If 0 G 0n('7‘) with \\9 — 9q\\2 < MKn^^'^ then from Lemma F.l d{fo, fg) > 
11^ — so that c(t) > Now let \\9 — 9q\\ 2 > MKn ■ Note 

that, from Lemma 3.2, 


£n{T) < + 


/3 A 1 

/ C \ 20+1^2 
\nT^J 


and £^,0 < n, (2«+i)/(4o+4) if ^ ^ ^ + 1/2 and else en,o < n d/(2/3+i)_ This 
implies that for all t < fn = 

(B.IO) < M(logn)~^, 

which combined with II0—00lb > MiFn leads to ||0—0o||2 > (log n)^en(r). 

Theorem 5.1 of [36] implies that either d(fo, fg) > 1 — 1/e or 
V 2 ifo, fe) < Cd\fo, fg) (l + (logn)^ + ||0||2) . 


Moreover, since /o > cq, 

lb(/o, fg) - *=0 ^ ~ ~ 0(/o)) j dx 

= CO f ||0 - 0o||2 + - 0o,,)0(/o)) I > co||0 - 0o||i 

||0-0O||2 


and 


difoJe) 


> 


1 + logn + ||0 - 0o||2V^ 


^ ^n^'^Vlogn 
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so that d(/o,/e) ^ ^n{'r)^ogn and (2.8) is verified with c(t) > 0. To verify 
condition (2.9), we need to control the Hellinger entropy. Over the subset 
of 0n(a,r) dehned by \\6 — ^olb < M/V-ffni ||^ “ ^olli < M and lemmas 
F.l and C.l imply that the Hellinger entropy of this set is bounded by the 
L 2 entropy which is bounded by Cne^{a,T). If \\9 — ^olb > M/y/Kn, the 
above computations imply that d{fQ, fg) > (logn)“^. Moreover, if 

6 S 0„(a, r) and ||0 — 9'\\2 < £n{ci,T), then for all J > 0 

(B.ii) \0j-9]\ < yfj\\9-9'\\2 < Vj£n{a,T). 

3<J 

Choose J X £n{oi,T)~‘^. Then, since 9 = 9i + 92 and 9' = 9[ + 02 with 
01,0^ G and 02,02 G .RnUl, 

^ \9j - 0'I < 2Un + 2a"^/^.R„J"" <Un + yj£n{a,TY'^n£l^ QWn, 
j>J+l 

if rCa < M or 

X] 1^1“ ^'3 1 ~ + \/en(a,r)4«rV(«-i/2)^ 

j>J+i 

if > M. So that ||0 — 0'||i = 0(1) as soon as 

£n{ot,TY°'n£^QWn = 0(1). In the case /3 < q:+ 1/2 and r = 

(B.12) 

^-4aV(2«+l).^4a/(2a+l) ^ ^|^^-l/(2/3+l)^^ ^^^2^-4a/3/(2«+l) ^ 

the former relation is satished when r < while the latter requires 

r 3> In the case r > ^ (B.12) is replaced with 

7i-4a^/(2a+i).^4Q:/(2o+i).j-i/(o-i/2) _ which is Satisfied as soon as r < 

n“/2-i/4, In the case /3 > a + 1/2 the same results hold. 

Conditions (C1)-(C3) are direct consequences of the transformation (3.11) 
which in turns implies (B.8), combined with the definition of 0n('r) so that 
(Cl) and (C2) hold. 

Finally iV„(A„) is at most polynomial in n so that log A^„(A„) = o(ne^ q). 
This terminates the proof of the upper bound on the contraction rate of the 
MMLE empirical Bayes posterior. Then the lower bound in case /3 > a+ 1/2 
and ||0||2 > c > 0 follows from the combination of Theorem 2.2 and Lemmas 
C.2 and 3.3 together with the fact that when 0 G 0n(r), either d{fo,fg) > 
\\9 - 9oh or difoJe) > x n-V(4a(a+i))^-i/« > 
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Finally for the hierarchical Bayes result we note that the transformation 
(3.11) implies, as in (B.8) that 

^n{lpT,T'{0)) > -2nUn\\(p\\oo ^ +4(0), if t' < T, 

i 

which combined with Lemma F.3 and the definition of implies (H2) as 
soon as M2ne^ q > Mq logn with Mq large enough and < 1/n. Then the 
statement is a direct consequence of Theorem 2.3. 


B.5. Proof of Proposition 3.5. As in the proof of Proposition 3.4, Let 
oo replacing tq in the verification of (Bl) in Section B.4. Equation (F.4) in 
Lemma F.2, with Kn = L(^^n implies that for M > ||0o||i, 


n (||0 - 0 o ||2 < i^en(ao); ||0 - 0o||i < i^v^en(«o) + 2M|ao) 


> 


,2 

^ra,0 


Moreover ^Kn£n{cio) ^ (^^n and by using Lemmas 3.2 and 

3.3. (i.e. 7 ),-"o/( 2 q:o+i) < e^(Q.Q) < (n/logn)“^/i^^+^i, oq > /3) we have that 

VKeniro) < (n/ log „)l/[( 2 / 3 +l) 4 ao]-/ 3 /( 2 / 3 +l) ^ 

n (||0 - 0o||2 < Ken{ao)] ||0 - 0o||i < 3M|ao) > e-'^'<o/2. 


As in the case of type (T2) prior, 

{||0 - 0o||2 < Ken{ao)-, ||0 - 0o||i < 3M} c B(4,M 2 en(«o), 2), 


for some constant M2 > 0. 

To study conditions (Al), (A2) and (C1)-(C3), recall that the change of 
variable 'ifa,a'{^) is defined by (3.12) so that when a' > a 

log/v,^^^,(0)(x) - log/0(x) = ^(4""' - l)ei(pi 

i 

-logI j fe{x)exp (^^(4""' - l)6iipi{x)Jdx^^ 

< 2|a-a'|||0||i|4||oo. 

Let a £ An \Aq and define 

0„(a) = {9 = Asn{a)ei + ii„(a)02, 0i S Bi, 62 £ Mf} n {||0i||i < V^}. 

For all 6 £ 0„(a), 

||0||i < £nioi)\\ 6 i\\i+Rnia)\\ 62 \\i < Vn£nia)+Rnia)\\02\\m<-a~^ < C'v^e„(a), 
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for some constant C independent of a. Let 9 G {||0—^olb < Ksn{a)}r\Qn{o)- 
Then for all a < a' < a + 

gln(Xn) < 

SO that (2.5) is verified on {||0 — ^olb < ^£nio)} H 0n(ce) as soon as Un < 
an“^/^en(a)~^, Va > 0 when n is large enough. To prove (2.6), we decompose 
Qn{oiY into 0n,j = Qn{aY n {||6»||i G {jy/nen{a), {j + l)^/nen{a))}, j > 0. 
We use Lemma F.2 with a > 1/2 + so that 

E(llellila) < 

and from Lemma 3.2, Lldl^llila) < ^/nen{(y)- Also, for all j > Ji with Ji 
fixed and large enough following from Lemma F.2 we have 

n (||(9||i > j^/n£n{a)\a) < 

for some cq > 0 independent of a. On Qnj define Un,j = Un/{j log j) and 
construct a covering of [a, a + Un] with balls of radius Unj, the number of 
such balls is of order Nj = 0{j log j)- Then since 

sup pI ,(0 )(x„)<max sup pi je)M 

\a-a'\<Un \a'-ai\<Un,j 

we have that for all 6 G i?n,j (where Bnj was defined in (A2 bis)) 

Q®,n(A”") < < 2Nj, 

if a is chosen small enough in the definition of Un and 



Q^,J<T")dn(0|a) <iV,n(0 






Let j < Ji, then ||0||i < Jl^/nenia) and 

<3® n(A") < 


and, since by choosing A (in the definition of 0n(a)) large enough IT (0))(a)|Q:) 
^-coJfnel^ia) ^ 


L 




Qi^{X'^)<m{e\a) < e"°A"'^4(«)/2n(0^(ce)|a) < g-coJfn4(«)/2^ 


which implies (2.6). 
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Similarly to the case of prior (T2), we verify (A2). The tests are the same 
as in Section B.4, since fe if ^ and the argument 

follows the same line, with £n{(y) replacing £n{j) and ©^(a) replacing Qnij) 
although the definitions remain the same. Note that in this case we do not 
have to split into r large or small in the definition of ©. Equation (B.IO) 
is satished for all a G (l/2,logn/(161oglogn)] so that condition (A2) is 
verified. 

The verification of (C1)-(C3) follows the same lines as in the case of prior 
(T2) using the fact that if ||0||i < M, and if 
for all 9 G Qn(oi) and a G Aq, 

inf - ^n{9) > -1 

a<a+Un 

and 

sup in{'llJa,a'{9)) - in{9) < I 

a<a+Un 

The control over 0„(a)'^ is done as before by splitting it into the subsets 
Qn,j- Finally similarly to the preceding sections logA'„(A„) = o(ne^Q) for 
arbitrarily small C2 > 0. 

B.6. Proof of Proposition 3.6. From [7], together with the fact that 
9-kn ~ 6 £ Sk (where Sk denotes the k dimensional simplex) 

and that in Sk the set 



and the covering number of this set with balls of radius in Hellinger 
distance is bounded from above by (Ck/uf)^ so that for all u > A-sJk/n^ 
condition (2.9) is verified. Finally since /o is bounded from above and from 
below the Kullback-Leiber divergence is bounded by a constant times the 
square of the Hellinger distance and condition (Bl) is verified. Conditions 
(C1)-(C3) follow from the above arguments and the remark that when /o G 
PLao{P,L) then Aq C {A: < A:i(n/logfor some ki large enough, as 
in the case of prior (Tl). 

B.7. Proof of Theorem 2.2. Similarly to the proof of Corollary 2.1 we 
can write 

F,;n(0 : 6(9,Oo) < A„) = E^( ) 

sup G„(A)+o(l), 

AeAo 
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where G„(A) = 

Then similarly to the proof of Theorem 2.1 we take a Un covering of the 
set Aq with center points Ai, A2, ^n(Ao) (from conditions (Al) 

and (2.12)) that 

sup Gn{X) = sup sup f 

AeAo Ai p(\,\i)<Un Jd(1px^^\(^),^Q)<Snenfi 

N{Aq) 

< E / Qi^{x^)duie\Xi) 

< A^(Ao)(e-*"'^<°+""“' 

for some Wn 00, where in the first inequality we applied that following 
the (adjusted) condition (C3) and triangle inequality {9 : d{ilJx^^x{6),9o) < 
dn£n,o} C {d{6,6o) < 26n£n,o} U 0(j. We conclude the proof by combining 
the two displays. 

C. Some technical Lemmas for priors (T2) and (T3) . 

Lemma C.l. For every a, t > 0, andC, G (0,1), taker] > c^(3C“^A'/c)^/“ 
(with c = c(a,r) and ci given in (3.4))and define the sets 

(C.l) 0n(a,r) = (Cc/3)enBi + Sn = £nia,T), Rn = Rnia,T), 

with 

= -2^>-^(e-’’’"""), 

and where Bi C M”, respectively H)*’’', denotes the unit ball on the Hilbert 
space (M”, 11-112)? respectively the reproducing kernel Hilbert space correspond¬ 
ing to the priors (T2) and (T3). Then 

logA^(cCen,0n(a,'r), II • II2) < Sjyne^, 
n(0()(Q;, r)|Q;, r) < and 

||0n(a,'r)||i < 2^?7r2ne^ V 1. 

Moreover, if a > 1/2, then for all Un/r < 1, 

/ 7/ \ —l/(a—1/2) 

(C.2) logn(||6»||i < Un\a,r) > -Ca 

C„<((a-l/ 2 )/ 8¥.(O))-V0-V2). 
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Let denote the ball of radius 1 centered at 0 for the norm Li and if 

(C.3) 0n(a,T) = Un^{^+RnK'"^ = -2$"^ f 

then 

n( 0 ^|a,r) < ) 

for some C > 0 independent of a and r. 


Proof. We follow the lines of the proof of Theorem 2.1 of [33]. Define 
7n such that 

$(7„) = U{e : ||0||2 < Ccen/3|a,r) > l|e-eo|| 2 <Ae„|a,r) ^ 

where denotes the distribution function of the standard normal ran¬ 
dom variable. Then we can see that 7^ > —i2n/2 and therefore by Borell’s 
inequality we have that 

(C.4) n(0^|a,r) < l-4>(7„ + i?0 < 1 - ci>(i?„/2) = . 

Then take a (2Cc/3)en-separated /ii,/i2 , /itv points contained in i?nHi 
for the II • II2 norm, so the hi -|- (Cc/3)en-balls are separated. Furthermore 
note that following from the tail bound on the Gaussian distribution function 
<1>(—x) < we have that 

(C.5) Rn = -2<^-\e-^^^") < v^8^. 

Then similarly to [33] (with C = rj and (/3o,a,T(Ccen/3) < that 

1 > 


This leads to the inequality 

logAf(Ccen,0n(a,r), || • II 2 ) < log iV(2Ccen/3, || • II 2 ) 

(C.6) < log < 5ryne^. 

Finally we note that 

(C.7) ||0n||i < {tRu + {K/Q)enf < 2t‘^rI V 1 < 2Sr^nel V 1 

Let a > 1/2, then the above argument implies that with Rn defined as in 
(C.3), 

n(0^|a,r) < 1 - $(i?„/2) < 
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Also 

n(6' : ll^lli < Un\a,T) 

= n(6' : ||6»||i < Un/T\a, 1) 

^ 11 ^ ( E > un/i2T) 11 n ^ < 


Vi=A+i 


j<Ji 


2 r Ji 


we choose Ji such that the first factor is bounded from below by 1/2. To do 
so we bound from above 


j=Ji+i 




j=Ji+i 




— g-SMn/(2r)g%- Ej>j^+i i 




i>A+i 

2 j-2a 

< g-s«„/(2r)g-L^2sv,(0)E^>j,+li-“-l/= 


-a+1/2' 


/ Un 2(f{0)J~ 
<exp -s -—^ 


2 T-2a 


+ 


S^J- 


4 a 


We choose “+ 1/2 < (|q, _ so that for all s > 0 


P 


' 00 \ 

Y > Un/{2T) 

j=Jl+l / 


< exp 



4 a J 


< exp 


8r2 J ’ 


where the last inequality comes from choosing s = {unlT)aJi°. This prob¬ 
ability is smaller than 1/2 as soon as aJf“u^/(8r^) > log2, i.e. as soon 
as Ji > ( 81 og 2 /a)^/( 2 a)(^^^/ 7 -)-i/«, Since a > 1/2, there exists a con¬ 
stant Jo such that both constraints are satisfied as soon as Ji > Jo((a — 
l/ 2 )/ 8 <y 9 ( 0 ))“^/(““^/ 2 ) _ '\Ye can then bound from below 


JJpO-<.-l/2|z.|< 
j<Jl ^ 


Un \ 

2rJiy 


= n 

j<Ji 



2tJi j 



For all < 2tJi 



2tJi j 



“ 2 r Ji 
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for all > 2r Ji 



2rJi J 



> 2 ^>( 1 ) - 1 


C 


which leads to 

n P > exp(-Jic), 


for some c independent of a and r. 


□ 


Next we show that for the scaling prior (T2) in the case a + 1/2 < /3 the 
second part of condition (2.12) holds. 

2a + l 

Lemma C.2. For the prior (T2), when a+1/2 < (5, then Enir) > n ■‘a+4 
for all T G An and for all 9 q G TiooiP, U Sp{L), 0o / 0 and 5n = o{M~‘^) 
we have 

_ nsnirf _^ . 

-loglldlie-6 »o|| 2 < (J„en(r)}|r, a) 

Proof. First of all note that Snij) > follows automati¬ 

cally from (A. 2). Then for every r G Aq 

- logn(|| 6 » - Ooh < dnEn{T)\T,a) > - log n(|| 6'||2 < (t) |r, a) 

hence for 6n = o{M~‘^) following from Lemma 3.3 the right hand side of the 
preceding display is of higher order than ne^ q. 

□ 


D. Some technical Lemmas for the hyper-prior distributions. 

In this section we collect the proofs of the technical lemmas on the hyper¬ 
prior distribution. 
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D.l. Proof of Lemma 3.4- Take any ko satisfying £n{ko) < 2e„_o- Since 
from the proof of Lemma 3.1 n£n{k)'^ > k\ogVk{l + o(l)) holds we get 

^(^o) > 

For the upper bound we note that following from Lemma 3.1 


OO OO 1 1 

7 r(A:) < ^ g-C 3 (n/logn)T+^ 

fc=£:(77,/log n) /c=e:(n/log n) 


< 

rs^ 


(Twr 


< p-w^ne 


2 

n,0 


D.2. Proof of Lemma 3.5. As a first step choose an arbitrary tq S Ao(u)n) 
with Eniro) < 2en,o- Then any r satisfying EniT) < 2e„(ro) belongs to the set 
Aolhin). Next consider r satisfying £^( 1 ") > 2£„(ro). Furthermore, note that 
for any ti,T 2 > 0 the RKHSs corresponding to the priors n(-|ri), n(-|r 2 ) are 
the same, i.e. BFi = Following from the definition of the concentration 
inequality (3.2) 


- logn(||6» - 6 »o||2 < KEn{T)\T) 

-.M in . JI^IIh-- logn(|| 0||2 < (A:/2)£„(r)|r) 

hm-^: \\h-eo\\2<Ke„iT)/2 

- . ,n ^ , ,('ro/'r)^||/i||^ro -logn(||0||2 < (ro/r)A:e„(ro)|ro) 

heM 0: ||/l—6o||2<A^£n(To) 

< max|(^)2, (^)"“|(^ ^inf^ ||/i||ii.o - logn(|| 0||2 < A:£n('ro)|To)) 

\\h — do\\2<KeniTo) 

< -max{(ro/r)^ (ro/r)"“}logn(||6'- ^olb < KEn{To)\To) 

< max{(ro/r)^, (ro/r)""}ne„(ro)^ 


Hence Enir)'^ < max{(ro/T)^, (tq/t) "}£„(ro)^, so one can conclude that 
[(2/u;n)To, (u;n/2)^/“To] n A„ C Ao (wn). Therefore following from the proof 
of Lemma 3.2 we have nEnirY > V and that [tq/2,2tq\ G A„, 

hence 


/■2to 

/ 

Jto/2 


n{T)dT > e 


2/(l+2t:.) 




concluding the hrst part of (HI). 

The second assumption in (HI) holds trivially for vf satisfying the upper 
bounds in the lemma: 


/ 


g-coCQiB^ne^ Q 


n{T)dT < e-^°^Xo 


' L 


OO 

2 2 HT)dT < 

CQCQW^nef^^Q 
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D.3. Proof of Remark 3.3. One can easily see that 

Nn{An) < g ^ 

Then by noting that Wn = o{wn) we get our statement. The upper bound 
on the hyper-entropy of the set Aq follows immediately from the proof of 
Proposition 3.2. 

D.4. Proof of Lemma 3.6. Similarly to the proof of Lemma 3.6 we choose 
an arbitrary oq G Ao(rc„) with e„(ao) < Cen,o- From Lemma 3.2 we have 
that ao > /3+o(l) in case Oq G S^{L)U'H{f3, L), since > e„(ao) > 

j^-ao/(l+2oo)_ 

First assume that ao < log n, then for any a G [ao/(l+2 log Wn /log n), ao] 
we have that 


- logn(||0 - 6 'o||2 < Ken{a)\a) 


< 


< 

rs_/ 


inf 

/ieH“0:||h-6)o||<A£„(ao) 
inf 


Ih“o + {Aren(ao)} 


„ , 1 ^. 0 -logn(|| 0||2 

.hm°‘0:\\h-9o\\<Ken{ao) 

(D.l) < {nen(ao)^}"°/" < wlnen{aof, 


aoja 


hence [ao/(l + 2 log log n), ao] C Aq. Then the first part of condition 
(HI) holds for tt satisfying the lower bound in the statement, since 


r-ao 

(D.2) / 

Jao/(l+2 log Wn /log n) 


TT{a)da > 


gp log Wn 

(logre)e'^20o 


> -2C2logn ^ 


where in the last inequality we used that by definition e'^Q > n ^logn. In 
case (logn)/2 < a < logn < ao we have that en(ao)~^^“ ^ < i 

hence similarly to (D.l) we have that a G Ap so the statement follows from 
(D.2) with ao replaced by logn. 

Finally we show that the second assumption in (HI) also holds if the 
upper bound on if is satished 
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E. Some technical Lemmas for the nonparametric regression 
model. 

Lemma E.l. For do € i 2 {M) (or equivalently fo G L 2 {M)) we have for 
Un < n“^/^/log(n) 


(E.l) 


sup sup Qi,T,ni^n) = 0{l). 

a,T ,\\e-eo\\2<£n{a,T) 


Proof. Let us denote by A the hyper-parameters a or r, and by p{X, A') 
the losses |q; —a'| or | logr' —logrj. Furthermore, we introduce the notations 

n n 

y''0A,A'(0j)ej(ti); 'ifx,i= sup VV’A,A'(6'i)ej(L). 

p(A,A')<nn^ p(A,A')<«n^ 

For instance in case of A = a this is 

n n 

i=i i=i 

while for A = r 

n n 

i=i i=i 

Then one can easily obtain (using Cauchy-Schwarz inequality) that both in 
the case of A = a and A = r we have 


“") A^^iej(L) < logn^ |6ljej(L)| 

i=i i=i 

(E.2) < 2M||0||2«nn“"+^/^logn. 


Writing out the dehnition of 


(E.3) = [ 

Jw 


sup 


n 




^ J. X /- 

p(A,A')<'«n V27rcr 
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We deal with the one dimensional integrals separately 


/ 

Jr 


1 




sup 


p(A,A')<“n a/^CT^ 

< 


' dxi 


L 




i<ll V27rcr2 


dxi + 


r^x,i 1 


'th 




dxi 


< 


—X.i' 


Jr V 27 ra‘^ 

< 1 + 2 M|| 0 || 2 nn(T“^n“"'''^/^ log n, 

where the last inequality follows from (E.2). Note that for 6q € i 2 {M) and 
0 G M"- satisfying ||0 — ^olb < we have 

Ph < ||0o||2 + en <2M. 

Therefore the right hand side of (E.3) is bounded from above by 
(E.4) 

(l + 2M||0||2U„(T-^n“"+^/2 < e2M||0||2«n<x-ln“-+3/2logn ^ 

□ 

Lemma E.2. Consider priors (T2) and (T3). For Un < n“^/logn if 
X = a, and Un < A n“^/^ if X = t < Tn we have that 

f Q^r,o.,ni^nMd9\T,a) < 

J&^ 

where £n = £{t) or £n{a) and p > C 2 ( 12 iC/c)^/“. 

Proof. Following the proof of Lemma E.l and using Cauchy-Schwarz 
inequality we get that 

[ Qf„„(T'„)n(d0|r,a) < / l'^•ld^(0|r, a) 

< < 3e-0/2)n£2_ 
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Lemma E.3. In the nonparametric regression model for t < Tn (for some 
—>■ ooj, log n, and 0 G &n{<^,T) (defined in (C.l)J, there 

exist tests ‘Pn(d) such that 

(E.5) sup sup [ 

a>0 e'ee„(a,r) JXn 

r<rn ||0_0'||2<||0_0q||2/18 


Proof. First of all note that the likelihood ratio test (using the sequence 
notation) 

n n n n 

(E.6) ipn{9) = I\\^{xi-'^ejej{ti)Y - '^{xi-'^eQjej{ti)Y . 

i=l j=l i=l j=l 

satisfies for all 6 G M” 

sup Eq,(i - (pn{9)) < exp{-^||6l - 6»o||i} 

0'eM":||e-e'||2<||0-eo||2/i8 v / / 

and Eg^ipn{0) < exp{ —^||0 — 6*o|||}, see for instance Section 7.7 of [13]. 

For notational convenience let us again denote by A both of the hyper¬ 
parameters a and r, then we have that 


(E.7) 


l-iPn{0)) sup 




< 


l|x„|| 2 <r„n 


p(A,A')<«n (2vrcr2)"/2 
1 - (pniO 


'dyir. 


1 

2^2 


X sup [e^a 

p(A,A^)<'u 


^EILi [{E"=i0'o(h)-xip-{E?=ibA,A'(e')e,hd-xd- 




x„\\2>Tnn p{X,y)<Un (27rcr2)”/2 


1 =,-AEiLi{E?=ibA,v(0')o(h)-xiP 


e 20 -^ 


dxn 

dXn- 


We deal with the two terms on the right hand side separately. 

First we examine the first term, where it is enough to show that the mul¬ 
tiplicative term (with the sup) is bounded from above by a constant. Using 
Cauchy-Schwarz and triangle inequalities and the assumption |ej(U)| < M 
we get that 
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n n n 

I “ {'^'^\,x'{^'j)ej{ti) - Xi}"^ 

p{\,X')<Un 



The right hand side of the preceding display following from (E.2) and ||x„ ||2 < 
n is bounded from above by 

(E. 8 ) 

2 M^|| 6 »'|| 2 Unra“"+Hlogn)( 2 rer„ + Cy/n\\9'\\2) < log n = 0 ( 1 ). 


Therefore it remained to deal with the second term on the right hand side 
of (E.7). Since || 0'||2 ^ V^Tn£n (following from 0' £ 0n(A)) we have that 
suPp(A,A')<«n II^A.A'(^')II < n'^^Wh ^ n'^"+^^‘^Tn£n = o(rer„). Therefore 


1 


^||x„||2>r„n(2vr 0-2)02 

(E.9) < [ 


e 27 ^ dxr, 


II^Tj, II 2 




e-^dx„ < 2-er-^rlli2a-) 


where the right hand side is of smaller order than exp{—(l/ 2 )n ||0 — 6 *o|| 2 }; 
since || 6 l - 6 »o ||2 < H^olb + ll^'lb < 1 + r„v^e„(A) = o(r„A/n). □ 


Lemma E.4. Consider the nonparametric regression model and priors 
of type (T1)-(T3). Take any —)■ oo and 0 < r < r^. Then for Un < 

log n 

sup PqA inf 4 (V’a,a'(^))-4(6'o) <-C 3 ne„(A)^| = 
\\9-9o\\<Ken(X) ^ pi\,X'}<Ur, J 

for C 3 > 2 + 3a-^K^f2. 


Proof. By triangle inequality we have that 
(E.IO) \inAx,X'{9)) - in{eo)\ < \inAx,X'i0)) " 4(^)1 + |4(4) ” 4(4)1- 

We deal with the two terms on the right hand side separately. 

Eirst consider the first term on the right hand side of (E.IO) and note that 
in case of prior (Tl) it is zero. For priors (T2)-(T3) following from Lemma 
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E.3 we have that for ||x „||2 < nr„ it is bounded above by a constant, while 
^e”(||x„||2 > riTn) < 6“'"”' 

For the second term on the right hand side of (E.IO) we apply Chernoff’s 
inequality and K{9, Oq) = (y~'^\\9 — ^olli 

sup Pl{ln{e) - 4(0o) < -(3a-2i^V2 + l)nel] 
e&\\e-eo\\2<Ker, 


< 

sup 

pn 

^00 

U0) - U0o) - E^oi^niO) - £n{9o)] < 


9£\\0-0oh<Ke„ 



< 

sup 


/2+l)nel {J2]=i{0o,j-0j)ej,Z)n'j 


0 &\\ 0 - 0 o\\ 2 <Ke„ 



< 

sup 


r-2i^2/2+l)n4g^-2(n/2)||eo-e||| < 


0&\\0-0o\\2<Ker, 




K'^ + 20-2 
^^2 



where Z denotes an n dimensional vector of iid standard normal random 
variables. □ 

F. Some Technical Lemmas in the density case. 

Lemma F.l. Let /o = fe^, fe with ||0 —0o||i < +oo, and 9,9o G £ 2 , then 

(F.l) V2{foJg)<\\fo\U\0-9o\\l 

d^ifo, fe) > exp(-ci||6» - 6»o||i)||6i - 9o\\l. 

Proof. We have following [23], see also the supplement, Section 3.2, 
Proof of Proposition 5 of [7] if || log/ejjoo < M 

(F.2) K{fo, fe) = {9o - 9, ^ifo ))2 - c(0o) + c(0), 


with 

c{9) — c{ 9 q) = log ^ J dx^ 

<l-{9o- 9,ip{fo)) + ||0 - 0o|li 

Since {9o - 9, ip{fo))l < ||/o||^|| 6 » - 6 »o|| 2 , this leads to 
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Similarly 


V2{foJe)<2{ Ee, 


2 -\ 






< 2 c 2 ||/o||Lll^-^o||i. 


Finally using the inequality je’' — e"'| = e’'|l — e"' ’'j > e'^e I"' '“l|u; — n| 
and that log /o > —oo we have that d(/o, fe) is bounded from below by 

r - eoj)ipj{x) + c{e) - c{Bo)fdx 

Jo . 

> e-2||<p||oo||0-0ol|i||0 _ 


where in the last inequality we applied the orthonormality of the basis 
and the inequality □ 


Lemma F.2. Let n(-|a,r) be the Gaussian prior with a > 1/2 and r G 
(n““,n^) then 

(F.3) F;(||6'||i|a,r) = TE{\\e\\i\a,T = 1) = tAo, < oo, 

n (||0||i > t + r^ala, r) < e 


with 

o-a,r < 0-1/2,iL Va>l/2,r>0. 

Moreover, for all Kn > 0 going to infinity and all M > 0 

(F.4) u(\\B - OoWi > M+\\eoh + ^\\9-eo\\2\a,T) < e-^^. 

Proof. The first part of Lemma F.2 is Borell’s inequality associated to 
the Banach space £i = {B] \Bi\ < +oo} since 

n(||0||i < +oo|a,r) = 1, Va > 1/2,r > 0. 

To prove (F.4), let Kn > 0 then 

\\B - BoWi < ^/K\\B - Boh + Y1 

j ^ ^71 j ^ 
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and 


n(^ \ej\>M\a,T) = p 

j^Kn 


> m/t 

j>Kn 


< e 2U2 


where Zj ~ Af{0, 1). □ 

Lemma F.3. In the density estimation problem with a prior on Jq defined 
by (3.15), if there exists M > 0 and En such that ne^ —>■ + 00 , and 

{||0 - 0 OII 2 < en} n {||0||i < Mj C Bn{Bo,M 2 en, 2), 

then there exists oq > 0 such that for all {||0 — ^olb < e^j H {||0||i < M}, 

PI (4(0) -4(0o) < -2M2nel) < 

Proof. Let 0 G {||0 - 0o||2 < e^j n {||0||i < M}, 

^0”{4(0)-4(0o)<-2M2n4} 

< (l + iL(/o,/0) + P2(/o,^)e2*"''""°llill^ll°°)” 

< g-“0n£2 


for some oq > 0 proportional to M 2 . 


□ 
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